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Docket No.: PA-0033 US 

REMARKS 

The specification has been amended to delete reference to a web site recited at page 12, 
line 31 of the application in association with a software provider, and to recite instead, the address of 
the provider. No new matter is added by this amendment, and entry of the amendment is requested. 
Election/Restriction 

The Examiner stated that applicants election, with traverse of Group I and the ten elected 
sequences 1-3, 6, 8-11, 20 and 22 is acknowledged. Applicant's argument regarding the examination 
of claims 1-5 and 1 1-14 has been found persuasive and are therefore are hereby being examined but 
the argument regarding ten elected sequences has not been found persuasive. However, three SEQ 
IDs 1, 3 and 8 have been searched. The requirement is deemed proper and is therefor made FINAL. 
Applicants Response 

Applicants thank the Examiner for his reconsideration of the Restriction Requirement and 
examination of SEQ ID NOs:l, 3 and 8 relative to claims 1-5 and 1 1-14. 
Specification 

The Examiner has objected to the disclosure because it contains an embedded hyperlink and/or 
other form of browser-executable code at page 12, line 31. Applicant is required to delete the 
embedded hyperlink and/or other form of browser-executable code. 

The specification has been so amended and withdrawal of the objection is requested. 
35 U.S.C. § 10L Rejection of Claims 1-5 and 11-14 

The Examiner has rejected claims 1-5 and 11-14 under 35 U.S.C. § 101 because the claimed 
invention lacks patentable utility due to its not being supported by either a specific and substantial utility 
or a well established utility. 

The Examiner stated that the claimed compounds are supported by a specific and substantial 
asserted utility because the disclosed uses(s) of the nucleic acids are not specific and are generally 
applicable to any nucleic acid. The specification states that the nucleic acid compounds may be useful 
as probes for assisting in the isolation of full-length cDNAs or genes which would be used to make 
protein and optionally further usage to make antibodies, gene mapping isolation of homologous 
sequences, detection of gene expression, chromosomal markers and numerous other generic genetic 
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engineering uses. These uses are non-specific that are applicable to nucleic acids in general and are not 

particular or specific to the nucleic acids being claimed. 

The Examiner stated that a nucleic acid may be utilized to obtain a protein. However, 
identifying and studying the properties of a protein itself or the mechanisms in which the protein is 
involved does not define a "real world" context of use. Similarly, the other listed and asserted utilities 
as summarized above or in the instant specification are neither substantial or specific due to being 
generic in nature and applicable to a myriad of such compounds. Neither the specification as filed nor 
any art of record discloses or suggests any property or activity for the nucleic acid and/or protein 
compound(s) such that another non-asserted utility would be well established for the compounds. 
Applicants Response 

Applicants disagree that the instant application does not support a specific and substantial 
asserted utility or a well established utility that would be readily apparent to one of skill in the art based 
on the disclosure in the specification and the conventional knowledge of one skilled in the art. 

In particular, applicants disagree that the specification does not disclose a utility that is 
sufficiently "specific" to the claimed nucleic acids. This does not preclude, however, a general utility, 
contrary to the statement in the Training Materials where "specific utility" is defined (page 5). Practical 
real -world uses are not limited to uses that are unique to an invention. The law requires that the 
practical utility be "definite," not particular. Montedison, 664 F.2d at 375. 

The specification discloses that the claimed nucleic acids representing genes that are 
differentially expressed during adipocyte differentiation and which may be used to diagnose, stage, treat 
or to monitor the progression or treatment of a disorder associated with adipocyte differentiation. 
These disorders specifically include obesity, type II diabetes, lipodystrophy, and hyperinsulinemia. See 
specification at page 7, lines 30-33. This utility is fully supported by the specification at page 8 and 
Tables 2-5 showing that the claimed polynucleotides represent genes exhibiting significant (at least 2- 
fold) differential expression in differentiated human daipocytes. Clearly, this is not a utility that is 
"applicable to nucleic acids in general" as the Examiner alleges. 

In addition, the claimed invention has numerous practical, beneficial uses in toxicology testing 
and drug development, none of which requires knowledge of how the polypeptide coded for by the 
polynucleotide actually functions. 
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In support of the above statement, Applicants submit three expert Declarations under 37 

C.F.R. § 1.132, with respective attachments, and ten (10) scientific references filed before the July 28, 

2000 priority date of the instant application. The Rockett Declaration, Iyer Declaration, Bedilion 

Declaration, and the ten (10) references fully establish that, prior to the July 28, 2000 filing date of the 

parent Schebye '470 application, it was well-established in the art that: 

polynucleotides derived from nucleic acids expressed in one or more 
tissues and/or cell types can be used as hybridization probes — that is, as tools — to 
survey for and to measure the presence, the absence, and the amount of expression 
of their cognate gene; 

with sufficient length, at sufficient hybridization stringency, and with 
sufficient wash stringency — conditions that can be routinely established - expressed 
polynucleotides, used as probes, generate a signal that is specific to the cognate 
gene, that is, produce a gene-specific expression signal; 

expression analysis is useful, inter alia, in drug discovery and lead 
optimization efforts, in toxicology, particularly toxicology studies conducted early in 
drug development efforts, and in phenotypic characterization and categorization of 
cell types, including neoplastic cell types; 

each additional gene-specific probe used as a tool in expression 
analysis provides an additional gene-specific signal that could not otherwise have 
been detected, giving a more comprehensive, robust, higher resolution, statistically 
more significant, and thus more useful expression pattern in such analyses than would 
otherwise have been possible; 

biologists, such as toxicologists, recognize the increased utility of 
more comprehensive, robust, higher resolution, statistically more significant results, 
and thus want each newly identified expressed gene to be included in such an 
analysis; 

nucleic acid microarrays increase the parallelism of expression 
measurements, providing expression data analogous to that provided by older, lower 
throughput techniques, but at substantially increased throughput; 

accordingly, when expression profiling is performed using 
microarrays, each additional gene-specific probe that is included as a signaling 
component on this analytical device increases the detection range, and thus 
versatility, of this research tool; 

biologists, such as toxicologists, recognize the increased utility of 
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such improved tools, and thus want a gene-specific probe to each newly identified 
expressed gene to be included in such an analytical device; 

the industrial suppliers of microarrays recognize the increased utility 
of such improved tools to their customers, and thus strive to improve salability of 
their microarrays by adding each newly identified expressed gene to the microarrays 
they sell; 

it is not necessary that the biological function of a gene be known for 
measurement of its expression to be useful in drug discovery and lead optimization 
analyses, toxicology, or molecular phenotyping experiments; 

failure of a probe to detect changes in expression of its cognate gene 
does not diminish the usefulness of the probe as a research tool; and 

failure of a probe completely to detect its cognate transcript in any 
single expression analysis experiment does not deprive the probe of usefulness to the 
community of users who would use it as a research tool. 

The Patent Examiner does not dispute that the claimed polynucleotide can be used as a probe 
in cDNA microarrays and used in gene expression monitoring applications. Instead, the Patent 
Examiner contends that the claimed polynucleotide cannot be useful without precise knowledge of its 
biological function, or the biological function of the polypeptide it encodes. But the law has never 
required knowledge of biological function to prove utility. It is the claimed invention's uses, not its 
functions, that are the subject of a proper analysis under the utility requirement. 

In any event, as demonstrated by the Rockett Declaration, the Iyer Declaration, and the 
Bedilion Declaration, the person of ordinary skill in the art can achieve beneficial results from the 
claimed polynucleotide in the absence of any knowledge as to the precise function of the protein 
encoded by it. The uses of the claimed polynucleotide in gene expression monitoring applications are in 
fact independent of its precise biological function. 

I. The applicable legal standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent applicant 
need only show that the claimed invention is "practically useful," Anderson v. Natta, 480 F.2d 1392, 
1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the public. Brenner v. 
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Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a recent Court of Appeals 

for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing some identifiable benefit. 
See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] (1966); Brooktree Corp. v. 
Advanced Micro Devices, Inc., 977 F.2d 1555, 1571 [24 USPQ2d 1401] (Fed. Cir. 1992) 
("to violate Section 101 the claimed device must be totally incapable of achieving a useful 
result"); Fuller v. Berger, 120 F. 274, 275 (7th Cir. 1903) (test for utility is whether invention 
"is incapable of serving any beneficial end"). 

Juicy Whip Inc. v. Orange Bang Inc., 5 1 USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 

demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 20 USPQ2d 

1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit explained: 

An invention need not be the best or only way to accomplish a certain result, and it need only 
be useful to some extent and in certain applications: "[T]he fact that an invention has only limited 
utility and is only operable in certain applications is not grounds for finding lack of utility." 
Envirotech Corp. v. Al George, Inc., 730 F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 
1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is described 
so that a person of ordinary skill in the art would understand how to use the claimed invention, it is 
sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a., 212 U.S.P.Q. 327, 343 (3d Cir. 
1981). The specificity requirement is met unless the asserted utility amounts to a "nebulous expression" 
such as "biological activity" or "biological properties" that does not convey meaningful information 
about the utility of what is being claimed. Cross v. lizuka, 753 F.2d 1040, 1048 (Fed. Cir. 1985). 

In addition to conferring a specific benefit on the public, the benefit must also be "substantial." 
Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" utility. Nelson v. Bowler, 
626 F.2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" utility 
for the claimed invention, the threshold is met automatically and the applicant need not make any 
showing to demonstrate utility. Manual of Patent Examining Procedure at § 706.03(a). Only if there is 
no "well-established" utility for the claimed invention must the applicant demonstrate the practical 
benefits of the invention. Id. 
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Once the patent applicant identifies a specific utility, the claimed invention is presumed to 
possess it. In re Cortright, 165 R3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re Brana, 
51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office bears the 
burden of demonstrating that a person of ordinary skill in the art would reasonably doubt that the 
asserted utility could be achieved by the claimed invention. Id. To do so, the Patent Office must 
provide evidence or sound scientific reasoning. See In re hanger, 503 F.2d 1380, 1391-92, 183 
USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a showing, the burden shifts to 
the applicant to provide rebuttal evidence that would convince the person of ordinary skill that there is 
sufficient proof of utility. Brana, 51 F.3d at 1566. The applicant need only prove a "substantial 
likelihood" of utility; certainty is not required. Brenner, 383 U.S. at 532. 

II The uses of the claimed invention in disease detection and diagnosis, in toxicology 

testing and drug discovery are sufficient utilities under 35 U.S.C. §§ 101 and 112, first 
paragraph 

The claimed invention meets all of the necessary requirements for establishing a credible utility 
under the Patent Law: There are "well-established" uses for the claimed invention known to persons of 
ordinary skill in the art, and there are specific practical and beneficial uses for the invention disclosed in 
the patent application's specification. These uses are explained, in detail, in the Rockett Declaration, 
the Bedilion Declaration, and the Iyer Declaration accompanying this brief. Objective evidence, not 
considered by the Patent Office, further corroborates the credibility of the asserted utilities. 

A. The uses of claimed polynucleotides for toxicology testing, drug discovery, and 
disease diagnosis are practical uses that confer "specific benefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in toxicology 
testing, drug development and disease diagnosis through gene expression profiling. These uses are 
explained in detail in the accompanying Rockett Declaration, Iyer Declaration, and Bedilion 
Declaration, the substance of which is not rebutted by the Patent Examiner. There is no dispute that the 
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claimed invention is in fact a useful tool in cDNA microarrays used to perform gene expression analysis. 
That is sufficient to establish utility for the claimed polynucleotide. 

The instant application claims priority to USSN 60/222,470, filed July 28, 2000, having the 
identical specification (hereinafter "the Schebye '470 application"). 

In his Declaration, Dr. Rockett explains the many reasons why a person skilled in the art in 
2000 would have understood that any expressed polynucleotide is useful for a number of gene 
expression monitoring applications, e.g., in cDNA microarrays, in connection with the development of 
drugs and the monitoring of the activity of such drugs. (Rockett Declaration at, e.g., ffl 10-18). 

It is my opinion, therefore, based on the state of the art in toxicology at least since the mid- 
1990s . . . that disclosure of the sequence of a new gene or protein, with or without 
knowledge of its biological function, would have been sufficient information for a 
toxicologist to use the gene and/or protein in expression profiling studies in toxicology. 1 
[Rockett Declaration, f 18.] 

In his Declaration, Dr. Bedilion explains why a person of skill in the art in 2000 would have 
understood that any expressed polynucleotide is useful for gene expression monitoring applications 
using cDNA microarrays. (Second Bedilion Declaration, e.g., U 4-7.) In his Declaration, Dr. Iyer 
explains why a person of skill in the art in 2000 would have understood that any expressed 
polynucleotide is useful for gene expression monitoring applications using cDNA microarrays, stating 
that "[t]o provide maximum versatility as a research tool, the microarray should include - and as a 
biologist I would want my microarray to include - each newly identified gene as a probe." (Iyer 
Declaration, f 9.) 

In addition, Dr. Rockett explains in his Declaration that "there are a number of other differential 
expression analysis technologies that precede the development of microarrays, some by decades, and 
that have been applied to drug metabolism and toxicology research, including: (1) differential screening; 
(2) subtractive hybridization, including variants such as chemical cross-linking subtraction, suppression- 
PCR subtractive hybridization and representational difference analysis; (3) differential display; (4) 

"Use of the words 'it is my opinion' to preface what someone of ordinary skill in the art would 
have known does not transform the factual statements contained in the declaration into opinion 
testimony." In re Alton, 37 USPQ2d 1578, 1583 (Fed. Cir. 1996). 
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restriction endonuclease facilitated analyses, including serial analysis of gene expression (SAGE) and 

gene expression fingerprinting and (5) EST analysis." (Rockett Declaration, f 7.) 

Nowhere does the Patent Examiner address the fact that, as described on page 10 of the 
Schebye '470 application, the claimed polynucleotides can be used as highly specific probes in, for 
example, cDNA microarrays - probes that without question can be used to measure both the existence 
and amount of complementary RNA sequences known to be the expression products of the claimed 
polynucleotides. The claimed invention is not, in that regard, some random sequence whose value as a 
probe is speculative or would require further research to determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 

measuring and analyzing instrument for expression levels is as indisputable as a scale's utility for 

measuring weight. This use as a measuring tool, regardless of how the expression level data ultimately 

would be used by a person of ordinary skill in the art, by itself demonstrates that the claimed invention 

provides an identifiable, real-world benefit that meets the utility requirement. Raytheon v. Roper, 724 

F.2d 951, (Fed. Cir. 1983) (claimed invention need only meet one of its stated objectives to be useful); 

In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the invention works is irrelevant to 

utility); MPEP § 2107 ("Many research tools such as gas chromatographs, screening assays, and 

nucleotide sequencing techniques have a clear, specific, and unquestionable utility (e.g., they are useful 

in analyzing compounds )" (emphasis added). 

Literature reviews published before the filing of the Schebye '470 application 

describing the state of the art further confirm the claimed invention's utility. Rockett et al. confirm, for 

example, that the claimed invention is useful for differential expression analysis regardless of how 

expression is regulated: 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis, 
recognition of the importance of differential gene expression and characterization of 
differentially expressed genes has existed for many years. 

* * * 

Although differential expression technologies are applicable to a broad range of models, 
perhaps their most important advantage is that, in most cases, absolutely no prior 
knowledge of the specific genes which are up- or down-regulated is required. 
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* * * 

Whereas it would be informative to know the identity and functionality of all genes 
up/down regulated by . . . toxicants, this would appear a longer term goal .... 
However, the current use of gene profiling yields a pattern of gene changes for a 
xenobiotic of unknown toxicity which may be matched to that of well characterized 
toxins, thus alerting the toxicologist to possible in vivo similarities between the unknown 
and the standard, thereby providing a platform for more extensive toxicological 
examination, (emphasis in original) 

Rockett et al., Differential gene expression in drug metabolism and toxicology: practicalities, problems 

and potential , Xenobiotica 29:655-691 (July 1999) (Reference No. 5). 

In another pre- July 2000 article, Lashkari et al. state explicitly that sequences that are merely 

"predicted" to be expressed (predicted Open Reading Frames, or ORFs) - the claimed invention in 

fact is known to be expressed - have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORF or any 
other region of the genome ranging from a few base pairs to several kilobase pairs. 
There are many uses for these amplicons- they can be cloned into standard vectors or . 
specialized expression vectors, or can be cloned into other specialized vectors such as 
those used for two-hybrid analysis. The amplicons can also be used directly by, for 
example, arraying onto glass for expression analysis , for DNA binding assays, or for . 
any direct DNA assay, (emphasis added) 

Lashkari et al., Whole genome analysis: Experimental access to all genome sequenced segments 
through larger-scale efficient oligonucleotide synthesis and PCR , Proc. Nat. Acad. Sci. 94:8945-8947 
(Aug. 1997) (Reference No. 6). 

B. The use of polynucleotides coding for polypeptides expressed by humans as 

tools for toxicology testing, drug discovery, and the diagnosis of disease is now 
"well-established" 

The technologies made possible by expression profiling and the DNA tools upon which they 
rely are now well-established. The technical literature recognizes not only the prevalence of these 
technologies, but also their unprecedented advantages in drug development, testing and safety 
assessment. These technologies include toxicology testing, e.g., as described by Bedilion, Rockett, and 
Iyer in their Declarations. 
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Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., John C. 

Rockett et al., supra: 

Knowledge of toxin-dependent regulation in target tissues is not solely an academic pursuit as 
much interest has been generated in the pharmaceutical industry to harness this technology in 
the early identification of toxic drug candidates, thereby shortening the developmental process 
and contributing substantially to the safety assessment of new drugs. (Reference No. 5, page 
656) 

To the same effect are several other scientific publications, including Emile F. Nuwaysir et al., 

Microarravs and toxicology: The advent of toxicogenomics . Molecular Carcinogenesis 24:153-159 

(1999) (Reference No. 7); Sandra Steiner and N. Leigh Anderson, Expression profiling in toxicology - 

- potentials and limitations . Toxicology Letters 112-13:467-471 (2000) (Reference No. 8). 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 

incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human ToxChip 

comprising 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well as their responses to 
different types of toxic insult. Included on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and dioxin-like compounds, peroxisome 
proliferators, estrogenic compounds, and oxidant stress. Some of the other categories of genes 
include transcription factors, oncogenes, tumor suppressor genes, cyclins, kinases, 
phosphatases, cell adhesion and motility genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridization intensity is averaged and used for signal 
normalization of the other genes on the chip. 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of special interest 

in making a human toxicology microarray). 

The more genes that are available for use in toxicology testing, the more powerful the technique. 

"Arrays are at their most powerful when they contain the entire genome of the species they are being 

used to study." John C. Rockett and David J. Dix, Application of DNA arrays to toxicology . Environ. 

Health Perspec. 107:68 1-685 (1999) (Reference No. 9). Control genes are carefully selected for their 

stability across a large set of array experiments in order to best study the effect of toxicological 

compounds. See attached email from the primary investigator on the Nuwaysir paper, Dr. Cynthia 

Afshari, to an Incyte employee, dated July 3, 2000, as well as the original message to which she was 

responding (Reference No. 10), indicating that even the expression of carefully selected control genes 
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can be altered. Thus, there is no expressed gene which is irrelevant to screening for toxicological 
effects, and all expressed genes have a utility for toxicological screening. 

Further evidence of the well-established utility of all expressed polypeptides and 
polynucleotides in toxicology testing is found in U.S. Pat. No. 5,569,588 (Reference No. 4e) 
and published PCT applications WO 95/21944 (Reference No. 4a), WO 95/20681 (Reference 
No. 2), and WO 97/13877 (Reference No. 4g). 

WO 95/21944 ("Differentially expressed genes in healthy and diseased subjects"), 
published August 17, 1995, describes the use of microarrays in expression profiling analyses, 
emphasizing that patterns of expression can be used to distinguish healthy tissues from diseased tissues 
and that patterns of expression can additionally be used in drug development and toxicology studies, 
without knowledge of the biological function of the encoded gene product. In particular, and with 
emphasis added: 

The present invention involves . . . methods for diagnosing diseases . . . 
characterized by the presence of [differentially expressed] . . . genes, despite the absence 
of knowledge about the gene or its function . The methods involve the use of a 
composition suitable for use in hybridization which consists of a solid surface on which is 
immobilized at pre-defined regions thereon a plurality of defined oligonucleotide/ 
polynucleotide sequences for hybridization. Each sequence comprises a fragment of an 
EST . . . . Differences in hybridization patterns produced through use of this composition 
and the specified methods enable diagnosis of diseases based on differential expression 
of genes of unknown function . . . . [abstract] 

The method [of the present invention] involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences . . . and a defined set of oligonucleotide/polynucleotide[] . . . 
immobilized on a support. Those defined [immobilized] oligonucleotide/polynucleotide 
sequences are representative of the total expressed genetic component of the cells , 
tissues, organs or organism as defined by the collection of partial cDNA sequences 
(ESTs). [page 2] 

The present invention meets the unfilled needs in the art by providing methods for 
the . . . use of gene fragments and genes, even those of unknown full length sequence and 
unknown function, which are differentially expressed in a healthy animal and in an animal 
having a specific disease or infection by use of ESTs derived from DNA libraries of 
healthy and/or diseased/infected animals, [page 4] 
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Yet another aspect of the invention is that it provides ... a means for . . . 
monitoring the efficacy of disease treatment regimes including . . . toxicological effects 
thereof ." [page 4] 

It has been appreciated that one or more differentially identified EST or gene- 
specific oligonucleotide/polynucleotides define a pattern of differentially expressed genes 
diagnostic of a predisease, disease or infective state. A knowledge of the specific 
biological function of the EST is not required only that the EST[] identifies a gene or 
genes whose altered expression is associated reproducibly with the predisease, disease 
or infectious state, [page 4] 

As used herein, the term 'disease' or 'disease state' refers to any condition which 
deviates from a normal or standardized healthy state in an organism of the same species 
in terms of differential expression of the organism's genes. . . [whether] of genetic or 
environmental origin, for example, an inherited disorder such as certain breast cancers. . . 
.[or] administration of a drug or exposure of the animal to another agent, e.g., nutrition, 
which affects gene expression, [page 5] 

As used herein, the term 'solid support' refers to any known substrate which is 
useful for the immobilization of large numbers of oligonucleotide/polynucleotide 
sequences by any available method . . . [and includes, inter alia,] nitrocellulose, . . . glass, 
silica. . . . [page 6] 

By EST' or Expressed Sequence Tag' is meant a partial DNA or cDNA 
sequence of about 150 to 500, more preferably about 300, sequential nucleotides. . . . 
[page 6] 

One or more libraries made from a single tissue type typically provide at least 
about 3000 different (i.e., unique) ESTs and potentially the full complement of all 
possible ESTs representing all cDNAs e.g., 50,000 - 100,000 in an animal such as a 
human , [page 7] 

The lengths of the defined oligonucleotide/ polynucleotides may be readily 
increased or decreased as desired or needed. . . . The length is generally guided by the 
principle that it should be of sufficient length to insure that it is onfl average only 
represented once in the population to be examined , [page 7] 

Comparing the . . . hybridization patterns permits detection of those defined 
oligonucleotide/ polynucleotides which are differentially expressed between the healthy 
control and the disease sample by the presence of differences in the hybridization 
patterns at pre-defined regions [of the solid support], [page 13] 

It should be appreciated that one does not have to be restricted in using ESTs 
from a particular tissue from which probe RNA or cDNA is obtained[;] rather any or all 



119047 



14 



09/918,624 



Docket No.: PA-0033 US 

ESTs (known or unknown) may be placed on the support. Hybridization will be used 
rtol form diagnostic patterns or to identify which particular EST is detected. For 
example, all known ESTs from an organism are used to produce a 'master' solid support 
to which control sample and disease samples are alternately hybridized, [page 14] 

Diagnosis is accomplished by comparing the two hybridization patterns , wherein 
substantial differences between the first and second hybridization patterns indicate the 
presence of the selected disease or infection in the animal being tested. Substantially 
similar first and second hybridization patterns indicate the absence of disease or infection. 
This[,] like many of the foregoing embodiments^] may use known or unknown ESTs 
derived from many libraries, [page 18] 

Still another intriguing use of this method is in the area of monitoring the effects of 
drugs on gene expression , both in laboratories and during clinical trials with animal [s], 
especially humans, [page 18] 

WO 95/20681 ("Comparative Gene Transcript Analysis"), filed in 1994 by Appellants' 
assignee and published August 3, 1995, has three issued U.S. counterparts: U.S. Pat. Nos. 5,840,484, 
issued November 24, 1998; 6,114,114, issued September 5, 2000; and 6,303,297, issued October 
16, 2001. 

The specification describes the use of transcript expression patterns, or "images", each 
comprising multiple pixels of gene-specific information, for diagnosis, for cellular phenotyping, and in 
toxicology and drug development efforts. The specification describes a plurality of methods for 
obtaining the requisite expression data — one of which is microarray hybridization - and equates the 
uses of the expression data from these disparate platforms. In particular, and with emphasis added: 

The invention provides a "method and system for quantifying the relative 
abundance of gene transcripts in a biological specimen. . . . [G]ene transcript imaging can 
be used to detect or diagnose a particular biological state, disease, or condition which is 
correlated to the relative abundance of gene transcripts in a given cell or population of 
cells. The invention provides a method for comparing the gene transcript image analysis 
from two or more different biological specimens in order to distinguish between the two 
specimens and identify one or more genes which are differentially expressed between the 
two specimens." [abstract] 

" [Wle see each individual gene product as a 'pixel' of information , which relates 
to the expression of that, and only that, gene . We teach herein [] methods whereby the 
individual 'pixels' of gene expression information can be combined into a single gene 
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transcript 'image/ in which each of the individual genes can be visualized simultaneously 
and allowing relationships between the gene pixels to be easily visualized and 
understood." [page 2] 

"The present invention avoids the drawbacks of the prior art by providing a 
method to quantify the relative abundance of multiple gene transcripts in a given biological 

specimen The method of the instant invention provides for detailed diagnostic 

comparisons of cell profiles revealing numerous changes in the expression of individual 
transcripts." [page 6] 

"High resolution analysis of gene expression be used directly as a diagnostic 
profile . ..." [page 7] 

"The method is particularly powerful when more than 100 and preferably more 
than 1,000 gene transcripts are analyzed." [page 7] 

"The invention . . . includes a method of comparing specimens containing gene 
transcripts." [page 7] 

"The final data values from the first specimen and the further identified sequence 
values from the second specimen are processed to generate ratios of transcript 
sequences, which indicate the differences in the number of gene transcripts between the 
two specimens." [i.e., the results yield analogous data to microarrays] [page 8] 

"Also disclosed is a method of producing a gene transcript image analysis by first 
obtaining a mixture of mRNA, from which cDNA copies are made." [page 8] 

"In a further embodiment, the relative abundance o the gene transcripts in one cell 
type or tissue is compared with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to identify the differences and similarities." [page 9] 

"In essence, the invention is a method and system for quantifying the relative 
abundance of gene transcripts in a biological specimen. The invention provides a 
method for comparing the gene transcript image from two or more different biological 
specimens in order to distinguish between the two specimens. ..." [page 9] 

"[T]wo or more gene transcript images can be compared and used to detect or 
diagnose a particular biological state, disease, or condition which is correlated to the 
relative abundance of gene transcripts in a given cell or population of cells." [pages 9 - 
10] 

"The present invention provides a method to compare the relative abundance of 
gene transcripts in different biological specimens. . . . This process is denoted herein as 
gene transcript imaging. The quantitative analysis of the relative abundance for a set of 
gene transcripts is denoted herein as 'gene transcript image analysis' or 'gene transcript 
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frequency analysis'. The present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue from any type of organism ." [page 
11] 

"The invention has significant advantages in the fields of diagnostics, toxicology 
and pharmacology, to name a few." [page 12] 

M [G]ene transcript sequence abundances are compared against reference 
database sequence abundances including normal data sets for diseased and healthy 
patients. The patent has the disease(s) with which the patient's data set most closely 
correlates ." [page 12] 

"For example, gene transcript frequency analysis can be used to different normal 
cells or tissues from diseased cells or tissues. . . ." [page 12] 

" In toxicology , . . . [g]ene transcript imaging provides highly detailed information 
on the cell and tissue environment, some of which would not be obvious in conventional, 
less detailed screening methods. The gene transcript image is a more powerful method to 
predict drug toxicity and efficacy . Similar benefits accrue in the use of this tool in 
pharmacology. ..." [page 12] 

" In an alternative embodiment , comparative gene transcript frequency analysis is 
used to differentiate between cancer cells which respond to anti-cancer agents and those 
which do not respond." [page 12] 

"In a further embodiment, comparative gene transcript frequency analysis is used 
... for the selection of better pharmacologic animal models." [page 14] 

"In a further embodiment, comparative gene transcript frequency analysis is used 
in a clinical setting to give a highly detailed gene transcript profile of a diseased state or 
condition." [page 14] 

" An alternate method of producing a gene transcript image includes the steps of 
obtaining a mixture of test mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the test mRNAs. Next, a fixed 
amount of the test mRNA is added to the arrayed probes. The test mRNA is incubated 
with the probes for a sufficient time to allow hybrids of the test mRNA and probes to 
form. The mRNA-probe hybrids are detected and the quantity determined ." [page 15] 

" ITIhis research tool provides a way to get new drugs to the public faster and 
more economically." [page 36] 

" In this method, the particular physiologic function of the protein transcript need 
not be determined to qualify the gene transcript as a clinical marker." [page 38] 
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"[T]he gene transcript changes noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. Changes in the gene transcript 
image analyses are evaluated as indicators of toxicity by correlation with clinical signs and 
symptoms and other laboratory results. . . . The . . . analysis highlights any toxicological 
changes in the treated patients." [page 39] 

U.S. Pat No. 5,569,588 ("Methods for Drug Screening") ("the '588 patent"), issued 
October 29, 1996, with a priority date of August 1995, describes an expression profiling platform, the 
"genome reporter matrix", which is different from nucleic acid microarrays. Additionally describing use 
of nucleic acid microarrays, the patent makes clear that the utility of comparing multidimensional 
expression datasets is independent of the methods by which such profiles are obtained. The patent 
speaks clearly to the usefulness of such expression analyses in drug development and toxicology, 
particularly pointing out that a gene's failure to change in expression level is a useful result. Thus, with 
emphasis added, 

The invention provides "[m]ethods and compositions for modeling the 
transcriptional responsiveness of an organism to a candidate drug. . . . [The final step of 
the method comprises] comparing reporter gene product signals for each cell before and 
after contacting the cell with the candidate drug to obtain a drug response profile which 
provides a model of the transcriptional responsiveness of said organism to the candidate 
drug." [abstract] 

"The present invention exploits the recent advances in genome science to provide 
for the rapid screening of large numbers of compounds against a systemic target 
comprising substantially all targets in a pathway for] organism ." [col. 1] 

"The ensemble of reporting cells comprises as comprehensive a collection of 
transcription regulatory genetic elements as is conveniently available for the targeted 
organism so as to most accurately model the systemic transcriptional response. Suitable 
ensembles generally comprise thousands of individually reporting elements; preferred 
ensembles are substantially comprehensive, i.e. provide a transcriptional response 
diversity comparable to that of the target organism. Generally, a substantially 
comprehensive ensemble requires transcription regulatory genetic elements from at least a 
majority of the organism's genes, and preferably includes those of all or nearly all of the 
genes . We term such a substantially comprehensive ensemble a genome reporter matrix." 
[col. 2] 

"Drugs often have side effects that are in part due to the lack of target specificity. 
• • • [A] genome reporter matrix reveals the spectrum of other genes in the genome also 
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affected by the compound. In considering two different compounds both of which 
induce the ERG10 reporter, if one compound affects the expression of 5 other reporters 
and a second compound affects the expression of 50 other reports, the first compound is, 
a priori, more likely to have fewer side effects." [cols. 2-3] 

"Furthermore, it is not necessary to know the identity of any of the responding 
genes ." [col. 3] 

M [A]ny new compound that induces the same response profile as [a] . . . 
dominant tubulin mutant would provide a candidate for a taxol-like pharmaceutical." 
[col. 4] 

"The genome reporter matrix offers a simple solution to recognizing new 
specificities in combinatorial libraries. Specifically, pools of new compounds are tested 
as mixtures across the matrix. If the pool has any new activity not present in the original 
lead compound, new genes are affected among the reporters." [col. 4] 

" A sufficient number of different recombinant cells are included to provide an 
ensemble of transcriptional regulatory elements of said organism sufficient to model the 
transcriptional responsiveness of said organism to a drug. In a preferred embodiment, the 
matrix is substantially comprehensive for the selected regulatory elements, e.g. essentially 
all of the gene promoters of the targeted organism are included." [cols. 6-7] 

"In a preferred embodiment, the basal response profiles are determined. . . . The 
resultant electrical output signals are stored in a computer memory as genome reporter 
output signal matrix data structure associating each output signal with the coordinates of 
the corresponding microtiter plate well and the stimulus or drug. This information is 
indexed against the matrix to form reference response profiles that are used to determine 
the response of each reporter to any milieu in which a stimulus may be provided. After 
establishing a basal response profile for the matrix, each cell is contacted with a 
candidate drug. The term drug is used loosely to refer to agents which can provoke a 
specific cellular response. . . . The drug induces a complex response pattern of 
repression, silence and induction across the matrix . . . .The response profile reflects the 
cell's transcriptional adjustments to maintain homeostasis in the presence of the drug. . . . 
After contacting the cells with the candidate drug, the reporter gene product signals from 
each of said cells is again measured to determine a stimulated response profile. The 
basal o[r] background response profile is then compared with . . . the stimulated 
response profile to identify the cellular response profile to the candidate drug." [cols. 7 - 
8] 

" In another embodiment of the invention , a matrix [i.e., array] of hybridization 
probes corresponding to a predetermined population of genes of the selected organism is 
used to specifically detect changes in gene transcription which result from exposing the 
selected organism or cells thereof to a candidate drug. In this embodiment, one or more 
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cells derived from the organism is exposed to the candidate drug in vivo or ex vivo under 
conditions wherein the drug effects a change in gene transcription in the cell to maintain 
homeostasis. Thereafter, the gene transcripts, primarily mRNA, of the cell or cells is 
isolated . . . [and] then contacted with an ordered matrix [array] of hybridization probes, 
each probe being specific for a different one of the transcripts, under conditions where 
each of the transcripts hybridizes with a corresponding one of the probes to form 
hybridization pairs. The ordered matrix of probes provides, in aggregate, complements 
for an ensemble of genes of the organism sufficient to model the transcriptional 
responsiveness of the organism to a drug. . . . The matrix- wide signal profile of the drug- 
stimulated cells is then compared with a matrix-wide signal profile of negative control 
cells to obtain a specific drug response profile." [col. 8] 

"The invention also provides means for computer-based qualitative analysis of 
candidate drugs and unknown compounds. A wide variety of reference response 
profiles may be generated and used in such analyses." [col. 8] 

" Response profiles for an unknown stimulus (e.g. new chemicals, unknown 
compounds or unknown mixtures) may be analyzed by comparing the new stimulus 
response profiles with response profiles to known chemical stimuli ." [col. 9] 

"The response profile of a new chemical stimulus may also be compared to a 
known genetic response profile for target gene(s)." [col. 9] 

The August 11, 1997 press release from the '588 patent's assignee, Acacia Biosciences 
(now part of Merck) (reference "h" attached hereto), and the September 15, 1997 news report by 
Glaser, "Strategies for Target Validation Streamline Evaluation of Leads," Genetic Engineering News 
(reference "4i" attached hereto), attest the commercial value of the methods and technology described 
and claimed in the '588 patent. 

WO 97/13877 ("Measurement of Gene Expression Profiles in Toxicity 
Determinations"), published April 17, 1997, describes an expression profiling technology differing 
somewhat from the use of cDNA microarrays and differing from the genome reporter matrix of the '588 
patent; but the use of the data is analogous. As per its title, the reference describes use of expression 
profiling in toxicity determinations. In particular, and with emphasis added: 

"[T]he invention relates to a method for detecting and monitoring changes in gene 
expression patterns in in vitro and in vivo systems for determining the toxicity of drug 
candidates." [Field of the invention] 
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"An object of the invention is to provide a new approach to toxicity assessment 
based on an examination of gene expression patterns, or profiles , in in vitro or in vivo test 
systems." [page 3] 

"Another object of the invention is to provide a rapid and reliable method for 
correlating gene expression with short term and long term toxicity in test animals." [page 
3] 

"The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel . . . methodologies that permit the formation of gene 
expression profiles for selected tissues .... Such profiles may be compared with those 
from tissues of control organisms at single or multiple time points to identify expression 
patterns predictive of toxicity ." [page 3] 

"As used herein, the terms 'gene expression profile,' and 'gene expression pattern' 
which is used equivalently, means a frequency distribution of sequences of portions of 
cDNA molecules sampled from a population of tag-cDNA conjugates. . Preferably, 
the total number of sequences determined is at least 1000; more preferably, the total 
number of sequences determined in a gene expression profile is at least ten thousand ." 
[page 7] 

"The invention provides a method for determining the toxicity of a compound by 
analyzing changes in the gene expression profiles in selected tissues of test organisms 

exposed to the compound Gene expression profiles derived from test organisms 

are compared to gene expression profiles derived from control organisms. ..." [page 7] 

Therefore, the potential benefit to the public, in terms of lives saved and reduced health care 

costs, are enormous. Evidence of the benefits of this information include: 

• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 

expression technology, information about the structure of a known transporter gene, 
and chromosomal mapping location, to identify the key gene associated with Tangiers 
disease. This discovery took place over a matter of only a few weeks, due to the 
power of these new genomics technologies. The discovery received an award from the 
American Heart Association as one of the top 10 discoveries associated with heart 
disease research in 1999. 

In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic information 
database. Other Incyte customers have privately reported similar experiences. The 
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implications of this significant saving of time and expense for the number of drugs that 
may be developed and their cost are obvious. 



• In a February 10, 2000, article in the Wall Street Journal, one Incyte customer stated 
that over 50 percent of the drug targets in its current pipeline were derived from the 
Incyte database. Other Incyte customers have privately reported similar experiences. 
By doubling the number of targets available to pharmaceutical researchers, Incyte 
genomic information has demonstrably accelerated the development of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities for the 
claimed invention in toxicology testing, drug development, and the diagnosis of disease, the Examiner's 
rejections should be overturned regardless of their merit. 

C. Objective evidence corroborates the utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider in 
determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence showing 
actual use or commercial success of the invention, can demonstrate conclusive proof of utility. 
Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 856, 12 
USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any person or 
entity other than the patentee is conclusive proof of utility. United States Steel Corp. v. Phillips 
Petroleum Co., 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing the 
sequences of all expressed genes (along with the polypeptide translations of those genes), in particular 
genes having medical and pharmaceutical significance such as the instant sequence. (Note that the value 
in these databases is enhanced by their completeness, but each sequence in them is independently 
valuable.) The databases sold by Appellants' assignee, Incyte, include exactly the kinds of information 
made possible by the claimed invention, such as tissue and disease associations. Incyte sells its 
database containing millions of sequences throughout the scientific community, including to 
pharmaceutical companies who use the information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
databases have proven to be valuable in, for example, the identification and development of drug 
candidates. Page et al., in discussing the identification and assignment of candidate drug targets, state 
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that "rapid identification and assignment of candidate targets and markers represents a huge challenge 

... [t]he process of annotation is similarly aided by the quantity and richness of the sequence specific 
databases that are currently available, both in the public domain and in the private sector (e.g. those 
supplied by Incyte Pharmaceuticals)" Page, M J. et al., "Proteomics: a major new technology for the 
drug discovery process," Drug Discov. Today 4:55-62 (1999) (Reference No. 1 1), see page 58, col. 
2). As Incyte adds information to its databases, including the information that can be generated only as 
a result of Incyte' s invention of the claimed polynucleotide and its use of that polynucleotide on cDNA 
microarrays, the databases become even more powerful tools. Thus the claimed invention adds more 
than incremental benefit to the drug discovery and development process. 

III. The Patent Examiner's rejections are without merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to dismiss it 
altogether by arguing that the disclosed and well-established utilities for the claimed polynucleotide are 
not "specific, substantial, and credible" utilities. (Office Action at page 3). The Examiner is incorrect 
both as a matter of law and as a matter of fact. 

A. Because the uses of claimed polynucleotides in toxicology testing, drug 

discovery, and disease diagnosis are practical uses beyond mere study of the 
invention itself, the claimed invention has substantial utility 

The PTO's rejection of the claims at issue is tantamount to a rejection based on the grounds 
that the use of an invention as a tool for research is not a "substantial" use. Because the PTO's 
rejection assumes a substantial overstatement of the law, and is incorrect in fact, it must be overturned. 

There is no authority for the proposition that use as a tool for research is not a substantial utility. 
Indeed, the Patent Office has recognized that just because an invention is used in a research setting 
does not mean that it lacks utility (Section § 2107.01 of the Manual of Patent Examining Procedure, 8 th 
Edition, August 2001, under the heading I. Specific and Substantial Requirements, Research Tools): 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utility (e.g., they are 
useful in analyzing compounds). An assessment that focuses on whether an invention is 
useful only in a research setting thus does not address whether the specific invention is 
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in fact "useful" in a patent sense. Instead, Office personnel must distinguish between 
inventions that have a specifically identified utility and inventions whose specific utility 
requires further research to identify or reasonably confirm. 

The Patent Office's actual practice has been, at least until the present, consistent with that 
approach. It has routinely issued patents for inventions whose only use is to facilitate research, such as 
DNA ligases. These are acknowledged by the PTO's Training Materials themselves to be useful, as 
well as DNA sequences used, for example, as markers. 

Only a limited subset of research uses are not "substantial" utilities: those in which the only 
known use for the claimed invention is to be an object of further study, thus merely inviting further 
research. This follows from Brenner, in which the U.S. Supreme Court held that a process for making 
a compound does not confer a substantial benefit where the only known use of the compound was to 
be the object of further research to determine its use. Id. at 535. Similarly, in Kirk, the Court held that 
a compound would not confer substantial benefit on the public merely because it might be used to 
synthesize some other, unknown compound that would confer substantial benefit. Kirk, 376 F.2d at 
940, 945 ("What appellants are really saying to those in the art is take these steroids, experiment, and 
find what use they do have as medicines."). Nowhere do those cases state or imply, however, that a 
material cannot be patentable if it has some other beneficial use in research. 

Applicants' Showing of Facts Overcomes The Examiner's Concern That Applicants' 
Invention Lacks "Specific Utility" 

The Examiner alleges that "the claimed invention is not supported by either a specific and 
substantial asserted utility or a well-established utility"(Office Action, page 3.) 

Appellants' submission of additional facts overcomes this concern. Those facts demonstrate 
that, far from applying regardless of the specific properties of the claimed polynucleotides, the utility of 
Appellants' claimed polynucleotides as gene-specific probes depends upon specific properties of the 
polynucleotides, that is, their nucleic acid sequences. 

"[E]ach probe on ... [a "high density spotted microarray[]"], with careful design and sufficient 
length, and with sufficiently stringent hybridization and wash conditions, binds specifically and with 
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minimal cross-hybridization, to the probe's cognate transcript" 1 ; "[e]ach gene included as a probe on a 

microarray provides a signal that is specific to the cognate transcript, at least to a first 
approximation." 2 Accordingly, "each additional probe makes an additional transcript newly detectable 
by the microarray, increasing the detection range, and thus versatility, of this analytical device for gene 
expression profiling" 3 ; equally, "[e]ach new gene-specific probe added to a microarray thus increases 
the number of genes detectable by the device, increasing the resolving power of the device." 4 

Although not required for present purposes, it would be appropriate to state on the record here 
that the specificity of nucleic acid hybridization was well-established far earlier than the development of 
high density spotted microarrays in 1995, and indeed is the well-established underpinning of many, 
perhaps most, molecular biological techniques developed over the past 30 - 40 years. 

IV. By requiring the patent applicant to assert a particular or unique utility, the Patent 

Examination Utility Guidelines and Training Materials applied by the Patent Examiner 
misstate the law 

There is an additional, independent reason to overturn the rejections: to the extent the rejections 
are based on Revised Interim Utility Examination Guidelines (64 FR 71427, December 21, 1999), the 
final Utility Examination Guidelines (66 FR 1092, January 5, 2001) and/or the Revised Interim Utility 
Guidelines Training Materials (USPTO Website www.uspto.gov, March 1, 2000), the Guidelines and 
Training Materials are themselves inconsistent with the law. 

The Training Materials, which direct the Examiners regarding how to apply the Utility 
Guidelines, address the issue of specificity with reference to two kinds of asserted utilities: "specific" 
utilities which meet the statutory requirements, and "general" utilities which do not. The Training 
Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general utility that 
would be applicable to the broad class of invention. For example, a claim to a polynucleotide 
whose use is disclosed simply as "gene probe" or "chromosome marker" would not be 



1 Declaration of Dr. John C. Rockett, H 10(i), emphasis added. 

2 Declaration of Dr. Vishwanath R. Iyer, H 7 (emphasis added). See the footnote at U 7 for a slightly more 
"nuanced" view. 

3 Declaration of Dr. John C. Rockett, H 10(ii). 

4 Declaration of Dr. Vishwanath R. Iyer, H 7. 
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considered to be specific in the absence of a disclosure of a specific DNA target. Similarly, a 
general statement of diagnostic utility, such as diagnosing an unspecified disease, would 
ordinarily be insufficient absent a disclosure of what condition can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular," i.e., unique (Training Materials at page 52) as 
compared to the "broad class of invention." (In this regard, the Training Materials appear to parallel 
the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility Guidelines , 82 
J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the question to ask is 
whether or not a utility set forth in the specification is particular to the claimed invention.")). 

Such "unique" or "particular" utilities never have been required by the law. To meet the utility 
requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, and confer a 
"specific benefit" on the public. Brenner, 383 U.S. at 534. Thus, incredible "throwaway" utilities, such 
as trying to "patent a transgenic mouse by saying it makes great snake food," do not meet this standard. 
Karen Hall, Genomic Warfare , The American Lawyer 68 (June 2000) (quoting John Doll, Chief of the 
Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the Training 
Materials where "specific utility" is defined (page 5). Practical real-world uses are not limited to uses 
that are unique to an invention. The law requires that the practical utility be "definite," not particular. 
Montedison, 664 F.2d at 375. Appellants are not aware of any court that has rejected an assertion of 
utility on the grounds that it is not "particular" or "unique" to the specific invention. Where courts have 
found utility to be too "general," it has been in those cases in which the asserted utility in the patent 
disclosure was not a practical use that conferred a specific benefit. That is, a person of ordinary skill in 
the art would have been left to guess as to how to benefit at all from the invention. In Kirk, for 
example, the CCPA held the assertion that a man-made steroid had "useful biological activity" was 
insufficient where there was no information in the specification as to how that biological activity could be 
practically used. Kirk, 376 F.2d at 941. 

The fact that an invention can have a particular use does not provide a basis for requiring a 
particular use. See In re Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995) (disclosure 
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describing a claimed antitumor compound as being homologous to an antitumor compound having 
activity against a "particular" type of cancer was determined to satisfy the specificity requirement). 
"Particularity" is not and never has been the sine qua non of utility; it is, at most, one of many factors to 
be considered. 

Broad classes of inventions can satisfy the utility requirement so long as a person of ordinary 
skill in the art would understand how to achieve a practical benefit from knowledge of the class. Only 
classes that encompass a significant portion of nonuseful members would fail to meet the utility 
requirement. Montedison, 664 F.2d at 374-75. 

The Training Materials fail to distinguish between broad classes that convey information of 
practical utility and those that do not, lumping all of them into the latter, unpatentable category of 
"general" utilities. As a result, the Training Materials paint with too broad a brush. Rigorously applied, 
they would render unpatentable whole categories of inventions that heretofore have been considered to 
be patentable and that have indisputably benefitted the public, including the claimed invention. See 
supra § H.B. Thus the Training Materials cannot be applied consistently with the law. 

For all of the above reasons, applicants submit that the claimed invention, at least as recited in 
claims 1-5 and 1 1-14, is supported by both specific and substantial asserted utilities, a well as well 
established utilities, and withdrawal of the rejection of claims under 35 U.S.C. § 101 is therefore 
requested. 

35 U.S.C. § 112, First Paragraph, Rejection of Claims 1-5 and 11-14 

The Examiner has also rejected claims 1-5 and 11-14 under 35 U.S.C. § 112, first paragraph, 
since the claimed invention is not supported by either a specific and substantial asserted utility or a well 
established utility for the reasons set forth above, one skilled in the art would not know how to use the 
claimed invention. 

A pplicants Response 

To the extent that the rejection under 35 U.S.C. § 1 12, first paragraph, is based on the 
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improper allegation of lack of patentable utility under 35 U.S.C. § 101 for the reasons set forth by 
applicant above, it fails for the same reasons, and withdrawal of the rejection is therefore requested. 

35 U.S.C. § 112, Second Paragraph, Rejection of Claims 1-5 and 11-14 

The Examiner has rejected claimsl-5 and 11-14 under 35 U.S.C. § 112, second paragraph as 
being indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicants regards as the invention. 

The Examiner stated that claims 2 and 3 are rejected over the recitation of the phrases 
"differential expression of the cDNAs is greater than 2.5" and "differential expression of the cDNAs is 
greater than 3.0", respectively. It is unclear how the numbers 2.5 and 3.0 are derived, what the units of 
expression and if it is a ratio, what is the standard compared to which these numbers have been 
calculated. In the absence of any definition of numerical differential expression of the cDNAs or the 
standard of comparison of the claim, it is not clear what is claimed. 

Applicants Response 

Applicants disagree that the phrase "differential expression" is not sufficiently defined in the 
specification, nor the means by which the numerical values for differential expression are determined. 

"Differential Expression" is specifically defined at page 5, lines 27-29 as: 

refers to an increased, upregulated or present, or decreased, downregulated or 
absent, gene expression as detected by the absence, presence, or at least two-fold 
changes in the amount of transcribed messenger RNA or translated protein in a sample. 

Thus, the reference to "two-fold changes" clearly implies a ratio of two observations. The 

specification further provides at page 2, lines 30-32 that "The present invention provides a combination 

comprising a plurality of cDNAs for use in detecting changes in expression of genes encoding proteins 

associated with adipocyte differentiation.", clearly indicating that the differential expression compares 

gene expression between undifferentiated and differentiated adipocytes. The specification further 

discloses at page 7, lines 29-30 that 'The present invention identifies cDNAs which are differentially 

expressed during the conversion of preadipocytes to adipocytes". Clearly, therefore, the specification 

defines"differential expression" in the instant case as the measurement of mRNA expression in 
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differentiated adipocytes compared with that in preadipocytes and that the expression is determined in 

terms of a ratio between the two observations. 

The specification then describes at page 26, lines 8-13 how the samples to be compared for 
mRNA expression were prepared for analysis by culturing preadipocytes and inducing differentiation 
into mature adipocytes by culture in medium containing human insulin and a PPARy agonist. The 
specification then goes on to describe at pages 26-27 how mRNA samples from the two sample types 
were labeled for detection (Isolation and Labeling of Sample Polynucleotides ); The methods of 
hybridization, detection, and quantitation of mRNA expression at pages 27-28 (VII Hybridization 
and Detection); and the analysis of the results in terms of minimal criteria for differential expression 
(page 28, Data Analysis and Results). Clearly then, the specification defines differential expression, 
the means of quantitatively determining its value, and the standard for comparison. The claims are 
therefore clear and definite, and withdrawal of the rejection of claims 2 and 3 under 35 U.S.C. § 112, 
second paragraph is therefore requested. 

35 U.S.C. § 102fb\ Rejection of Claim 1 

The Examiner has rejected claim 1 under 35 U.S.C. § 102(b) as being anticipated by Turner et 
al. (Blood (1991), Vol 78(Suppl. 1), page 279). The Examiner stated that the rejection is based on the 
fact that, a recitation of the intended use of the claimed invention must result in a structural difference 
between the claimed invention and the prior art in order to patentably distinguish the claimed invention 
from the prior art. If the prior art structure is capable of performing the intended use, then it meets the 
claim. In a claim drawn to a process of making, the intended use must result in a manipulative 
difference as compared to the prior art. In this case, the Examiner stated, a cDNA having SEQ ID 
NO:l has clearly been taught by Turner et al. (Gen Bank Accession Number U70136; bases 1 to 
5041). The intended use of the present cclaim, i.e., differential expression during adipocyte 
differentiation is not given patentable weight because SEQ ID NO:l, as taught by Turner et al., 
inherently possesses the same claimed function as it is structurally identical. 

Applicants Response 

Applicants point out that claim 1 is a combination claim drawn to a pluraility of cDNA 
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sequences, SEQ ID NOs:l-71. As such the claimed combination is novel if at least one sequence is 

not found in the prior art. Whether or not SEQ ID NO:l is taught by Turner et aL, the combination is 
still novel and patentable if any other sequence in the claimed combination is novel and non-obvious. 
Since the Examiner has searched SEQ ID NOs:l, 3 and 8, and has cited no art against SEQ ID NO:3 
or SEQ ID NO: 8, these sequences have presumably been found free of the prior art, and the 
combination as recited in claim 1 is therefore novel and patentable. Turner et al does not teach or 
suggest SEQ ID NOs:3 or 8 or their use in detecting adipocyte differentiation, and withdrawal of the 
rejection of claim 1 as anticipated by Turner et al. is therefore requested. 
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CONCLUSION 



In light of the above amendments and remarks, Applicants submit that the present application is 
fully in condition for allowance, and request that the Examiner withdraw the outstanding 
objections/rejections. Early notice to that effect is earnestly solicited. Applicants further request that 
upon allowance of claim 1, that claims 6-10 be rejoined and examined as methods of use of the 
combination of claim 1 that depend from and are of the same scope as claim 1 in accordance with in 
Re Ochiai and the MPEP § 821.04. 

If the Examiner contemplates other action, or if a telephone conference would expedite 
allowance of the claims, Applicants invite the Examiner to contact the undersigned at the number 
listed below. 

Applicants believe that no fee is due with this communication. However, if the USPTO 
determines that a fee is due, the Commissioner is hereby authorized to charge Deposit Account 



Customer No.: 27904 
3160 Porter Drive 
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Fax: (650) 849-8886 



No. 09-0108. 



Respectfully submitted, 
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^differentially expressed genes in Healthy and drseased : Subject^ " * 

Cross Reference to Related Applications: av .is*-; - ecu ^ - ■ 

5 : r r >>:TO Serial No. 

,08/195,485 filed February 14, 1994, the contents of which are; incorporated herein by 

; referenOC; ■ -/ *■ hi, vV'tV- .*».-:_ i>: ivU" :V:-- J . : * „^;.«';;f : 

Field ofthe Invention ^ >!■ .,v;'!ij;i » ' ■ : ; .. v^^v, i M " V • ... . 
10 ^ T r " ' 3k ^skt >%entioi Elates to th& - use of immobilized 

oligonucleotide/polynucleotide or polynucleotide sequences for the identification, 
sequencing and characterization of genes which are implicated in disease, infection, 
or development and the use of such identified genes and the proteins encoded thereby 
-in diagnosis; projgnosis, therapy and drug discovery. ^ 

Background of the Invention - ° - ; 

Identification, sequencing and characterization of genes, especially 
human genes, is a major goal of modern scientific research. By identifying genes, 
determining their sequences and characterizing their biological function, it is possible 

20 to employ recotrinant DNA technology to produce large quantities of valuable "gene 
products", e.g., proteins and peptides. Additionally, knowledge of gene sequences 
can provide a key to diagnosis, prognosis and treatment of a variety of disease states 
in plants and animals which are characterized by inappropriate expression and/or 
repression of selected gene(s) or by the influence of external factors, e.g., carcinogens 

25 or teratogens, on gene function. The term disease-associated genes(s) is used herein 
in its broadest sence to mean not only genes associated with classical * inherited 
diseases, but also those associated with genetic predisposition to disease as well as 
infectious or pathogenic states resulting from gene expression by infectious agents or 
the effect on host cell gene expression by the presence of such a pathogen or its 

30 products Locating disease-associated genes will permit the development of 
diagnostic and prognostic reagents and methods, as well as possible therapeutic 
regimens, and the discovery of new drugs for treating or preventing the occurrence of 
such diseases. < 

Methods have been described for the identification of certain novel 

35 gene sequences, referred to as Expressed Sequence Tags (EST) [see, e.g., Adams et 
al, Science, 252:1651-1656 (1991); and International Patent Application No. 
WO93/00353, published January 7, 1993]. Conventially, an EST is a specific cDNA 
polynucleotide sequence, or tag, about 150 to 400 nucleotides in length, derived from 
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?a. messenger RNA inolecule by revere transcription, which is'ai marker fior^and 
component of, a human gene actually transcribed in vivo. ^However, as used herein 5 an 
EST also refers td a geriomidDN A ' fragment derived from;an organism, such as ! a 
:microorganism*the'DNAof^ I ^-yW* J'Oy/^ in w.i 

5 <- " 1 /■« Avariety of techniques have been described for identifying particular 
<gehe sequences on the basis of thfeir gene products*' For'ocampie, several techniques 
are described in the art [see, e.g., International Patent Application No. WO91A)7087, 
published May 30,1991]. Additionally known methods exist for the amplification of 
desired sequences [See,* e.g.; j International Patent; Application I No, W091/17271, 

10 published November 14, 1991, among others], e ^ v ^ ^ : . r/' 

* However, at present, there exist no established methods for filling the 

need in the art for methods and reagents which ^employ fragments erf differentially 
expressed genes of known, unknown (or previously unrecognized ) function or 
consequence to provide diagnostic and therapeutic methods and reagents for diagnosis 

15 and treatment of disease or infection; which ? conditions , are- characterized by such 
genes and gene products. It should be appreciated that it is the expression differences 
that are diagnostic of the altered state (e.g., predisease, disease, pathogenic, 
progression or infectious). Such genes associated with the altered state are likely to 
be the targets of drug discovery, whether the genes are the cause or the effect of the 

20 condition, identification of such genes provides insight into which gene expression 
needs to be re-altered in order to reestablished the healthy state. ; 

Summary of the Invention r - 

In one aspect, the invention provides methods for identifying gene(s) 

25 which are differentially expressed, for example, in a normal healthy organism and an 
organism having a disease. The method involves producing and , comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences obtained from either analogous cells, tissues or organs of a 
healthy organism and a diseased organism and a defined . set of 

30 oUgonucleotide/polynucleotide/polynucleotide sequence probes from either an 
healthy organism or a diseased organism immobilized on a support Those defined 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined the collection 
of partial cDNA sequences (ESTs). The differences between the hybridization 

35 patterns permit identification of those particular EST or gene-specific 
oligonucleotide/polynucleotide sequences associated with differential expression, and 
the identification of the EST permits identification of the clone from which it was 
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derived andusing ordinaiy skill further cloning and, if desired, sequencing of the full- 
length cDN A and genoniic counterpart, i.e,; gene, from whichlt wasfcbtriiiied k v * 1 ^< 
r '. ,j.-: rsiv-n v jn another aspect, the invention- provides methods substantia 
to those described above, but which permit identification of those gene(s) of a 
5 pathogen whictf are expressed in any biological sariipteof di-irifected organism based 
on comparative hybridization of RNA/d5NA samples derived from a healthy vebus 
infected organism, hybridized to an oligonucleotide/polynucleotide *et representative 
of the gene coding complement of the pathogen of interest- > - : ^ - ^ v 5 1 
rw^i %\i r ; , t In anbthw aspect, the invention pro 

10 to those 'described above, but which permit identification of those ESTs-specific 
oUgonucleotide^lyiradTOtide sequences of host gene(s) which represent genes being 
differentially expressed/ altered in expression by the disease state, I or infection and are 
expressed in any biological sample of an infected organism based on. comparative 
hybridization of RNA/cDNA saihples derived 4 from a healthy versus infected 

15' organism of interest ; ^ r "' i - I: ^' :,. ^..w,,^ 

^ v In a further aspect, the methods described above and in detail below, 
also provide methods for diagnosis of diseases or infections characterized by 
differentially expressed genes, the expression of which has been altered as a result of 
infection by the pathogen or disease causing agent in question. All identified 

20 differences provide the basis for diagnostic testing be it the altered expression of 
endogenous genes or the patterned expression of the genes of the infecting organism. 
Such patterns of altered expression are defined by comparing RNA/cDNA from the 
two states hybridized against a panel of oligonucleotide/polynucleotides representing 
the expressed gene component of a cell, tissue, organ or organism as defined by its 

25 collection of ESTs; r , r '■,•«■ 

Yet a further aspect of this invention provides a composition suitable 
for use in hybridization, which comprises a solid surface on which is immobilized at 
pre-defined regions thereon a plurality of defined oligonucleotide^)olynucleotide 
sequences for hybridization, each sequence comprising a fragment of an EST isolated 

30 from a cDNA or DNA library prepared from at least one selected tissue or cell 
sample of a healthy (ic, pre-disease state) animal, at least one analogous sample of 
an animal having a disease, at least one analogous sample of an animal infected with a 
pathogen or the pathogen itself, or any combination or multiple combinations thereof. 

An additional aspect of the invention provides an isolated gene 

35 sequence which is differentially expressed in a normal healthy animal and an animal 
having a disease, and is identified by the methods above. Similarly, an isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal can be identified by the methods above. 
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Yet another aspect of the invention is that it provides not only a means 
for a static diagnostic but al^ provides :a means for a caixying- out the pipcedure^ over 
time to measure disease progression as well i as monitoring the efficacy of disease, 
treatment regimes including an toxicologic a Vt ?u% , 

5 > : ; ; iir| Another aspect of the invention is an isolated protein produced by 
expression of the -gene sequences, identified above. / Such proteins are useful in 
therapeutic -composm^ yfor drug 

development *0 i :^<yf^ ;T-, r-=^;c V- Wtt>v;i-?/ ; p?: v ''. 

v ^ : ;;>,>• Other aspects and advantages of the present invention are described; 

10 further in the following detailed description of the preferred embodiments thereof. 

Detailed Description of the Invenrion *. ; ^ -ju < V. = y m 

^ The present invention meets the unfulfilled needs in the art by 
providing methods for the identification and use of gene fragments and genes, even 

15 those of unknown full length sequence and >i unknown -function, which - are 
differentially expressed in a ileal thy animal and in an animal having a specific disease 
or infection *by use of T ESTs draved from ,DNA libraries of healthy a and/or 
diseased/infected animals. Employing the methods of this invention permits the, 
resulting identification and isolation of such genes by using their corresponding ESTs 

20 and thereby also permits the production of protein products encoded by; such genes. 
The genes themselves and/or protein products, if desired, may be employed in the 
diagnosis or therapy of the disease or infection with which , the genes are associated, 
and in the development of new drugs therefor. r >: ;?M 

It has been appreciated that one or more differentially identified EST 

25 or gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state: A knowledge of 
the specific biological function of the EST is not required only that the ESTs 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state. The differences permit the identification of 

30 gene products altered in their expression by the disease and represent those products 
most likely to be targets of therapeutic intervention. Similarly, the product may be of 
the infecting organism itself and also be an effective target of intervention. 

/. Definitions. 

35 Several words and phrases used throughout this specification are 

defined as follows: 

As used herein, the term "gene" refers to the genomic nucleotide 
sequence from which a cDNA sequence is derived, which cDNA produces an EST, as 

4 
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described below. ;The term gene classically refers to- the genomic sequence, which, 
upon processing; can produce different cDNAs, e.g:i by- splicing events.* c However, 
for ease of reading, any full-length counterpart cDNA sequence which gives rise to an 
EST will also be referred toby ^shc^mdhereiriasa-geneV'^ l 1 - r ; J v 
5 ^ 1 < '\i fThe torn "organism" includes without 1 limitation, microbes, plants and 

> ' , ' r The term "animal" is used in its broadest sense to include all members 
of the animal kingdom, including humans. It should be understood, however, that 
according to this invention the same species of animal which provides the biological 

10 sample also is the source of the defined immobilized oligonucleotide/polynucleotides 
as defined below. • * * - 1 *•* s^V *'*■■■■• :,: v ■'-«.* i ■ 

The term "pathogen" is defined herein 1 as any molecule or organism 
which is capable of infecting an animal or plant and rephcating its nucleic acid 
sequences in the cells or tissues of that animal or plant . Such a pathogen is generally 

15 aissociated with a disease condition in the infected animal or plantV Such : pathogens 
may include viruses, which replicate intra- or extra-cellularly, or other organisms, 
such as bacteria, fungi or parasites, which generally infect tissues or die blood: 
Certain pathogens or microorganisms are known to exist, in sequential arid 
distinguishable stages of development, e.g., latent stages, infective' stages, and stages 

20 which cause symptomatic diseases. In these different stages, the pathogens are 
anticipated to express differentially certain genes and/or turn on or off host cell gene 
expression. 

As used herein, the term "disease" oir "disease state" refers to any 
condition which deviates from a normal or standardized healthy state in an organism 

25 of the same species in terms of differential expression of the organism's genes. In 
other words, a disease state can be any illness or disorder be it of genetic or 
environmental origin , for example, an inherited disorder such as certain breast 
cancers, or a disorder which is characterized by expression of gene(s) normally in an 
inactive, 'turned off state in a healthy animal, or a disorder which is characterized by 

30 under-expression or no expression of gene(s) which is normally activated or 'turned 
on' in a normal healthy animal. Such differential expression of genes may also be 
detected in a condition caused by infection, inflammation, or allergy, a condition 
caused by development or aging of the animal, a condition caused by administration 
of a drug or exposure of the animal to another agent, e.g., nutrition, which affects 

35 gene expression. Essentially, the methods described herein can be adapted to detect 
differential gene expression resulting from any cause, by manipulation of the defined 
oligonucleotide/polynucleotides and the samples tested as described below. The 
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concept of disease or disease state also includes its temporal aspects in ; terms of 
progression and treatment \ «■ j.....*; • , { ^:>-> } . ;v .••■v r .:; f| ,,-- • Jtl( : 
u v.: : : ; 31iewphn^ir"differentiaUy expressed" refers ;to thc^e -situations ini, 
which a gene transcript is found in. differing numbers of copies, or in activated vs 
5 inactivated states, * in different cell types or tissue types of ah* organism, having a 
selectedrdisease as contrasted to the levels of the gene transcript found in the same 
cells or tissues of a healthy organism. Genes may, be differentially expressed in^ 
differing states of activation in 5 microorganisms or pathogens in different stages of 
development fFor example, multiple copies of gene transcripts may be found in an 

10 ; u organism having a selected disease, while only one, or significantly fewer copies, of 
the same gene transcript arc found in a healthy organism, or vice-versa. iv 

i > As used herein; the term "solid support" refers to any known substrate 
whichr.. is ; useful r for the immobilization , of . large w n ,of ; 

oligonucleotide/polynucleotide sequences by any available ^ method, to enable 

15 detectable hybridization of the immobilized oligonucleotide^polynucleotide sequences 
with other, polynucleotide sequences in a sample; Among a number of available solid 
supports, one desirable example is the supports described in /International Patent 
Application No. WO91/07087, published May 30, 199LAIsp useful arc suports such : 
as but not limited to nitrocellulose, mylein, glass, silica ans Pall Biodyne C® It is 

20 < also anticipated that improvements yet to be made to conventional solid supports may 
also be employed in this invention. , . 

i ^Theterm "surface" means any generally two-dimensional structure on- 
a solid support to which the desired oligonucleotide/polynucleotide sequence is 
attached or immobilized A surface may have steps, ridges, kinks, tenaces and the . 

25 > like. ; - . . . ,. s ...... ~ ;; . ^ 

As used herein, the term "predefined region" refers to a localized area 
on a surface of a solid support on which is immobilized one or multiple copies of a 
particular oligonucleotide/polynucleotide ? sequence and which enables the 
identification of the oligonucleotide/polynucleotide at the position, if hybridization of 

30 that oligonucleotide/polynucleotide to a sample polynucleotide occurs. 

By "immobilized", refers to the attachment of the 
oligonucleotide/polynucleotide to the solid support Means of immobilization arc 
known and conventional to those of skill in the art, and may depend on the type of 
support being used. 

35 By "EST" or "Expressed Sequence Tag" is meant a partial DNA or 

cDNA sequence of about 150 to 500, more preferably about 300, sequential 
nucleotides of a longer sequence obtained from a genomic or cDNA library prepared 
from a selected cell, cell type, tissue or tissue type, organ or organism which longer 
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sequence corresponds ^ to an mRNA-of a gene found in that library, .An EST is r 
generally DNA. One or more libraries made from a single tissue liypevtypicallyiju. 
provide at least about 3(X)0 different (i.e.- unique) ESTsand -potentially ; the fillip 
complement of all possible ESTs representing all cDNAs e.£ M 50;000- 100,000 dn an j 
5 animal such as a human. Further background and information on the construction of 
ESTs is described in M D. Adams et tal, Science, , 252:1651^1656 (1991): and 
International Application NumberPGT/US92/05222 (January 7, 1593). i y v . , ■ m 

tv „ . As used herein, the torn Mefin^ 
sequence" refers to a known nucleotide sequence fragment of a selected EST or gene. 

10 This term is used interchangeably ; with the term "fragments of ESTh These 
sequential sequences are generally comprised of between' about 15 to about < 45 
nucleotides and more preferably between about 20 to about 25 nucleotides in length; . 
Thus any single EST of 300 nucleotides in length may provide about 280 different ' 
defined oUgoriucleotide/pblynucleotide sequences of 20 nucleotides in length (e.g., 

15 20-mers). The lengths ; of die defined oligonucleotide/polynucleotides riiay be readily ^ 
increased or decreased as desired or needed, depending on the limitations of the solid 
support on which they may be immobilized or the requirements of the hybridization 
conditions to be employ edIThe length is generally guided by the principle that it 
should be of sufficient length to insure that it is one average only represented once in 

20 the population to be examined. Generally, these defined 

oligonucleotide/polynucleotides are RN A or DNA and are preferably derived from 
the anti-sense strand of the EST sequence or from a corresponding mRNA sequence 
to enable their hybridization with samples of RNA or DNA. Modified nucleotides' 
may be incorporated to increase stability and hybridization properties/ : 

25 By the term "plurality of defined '■ oligonucleotide/polynucleotide * 

sequences" is meant the following. A surface of a solid support may immobilize a 
large number of "defined oligonucleotide/polynucleotides". For example, depending 
upon the nature of the surface, it can immobilize from about 300 to upwards of : 
60,000 defined 20-mer oligonucleotide/polynucleotides. It is anticipated that future 

30 improvements to solid surfaces will permit considerably larger such pluralities to be 
immobilized on a single surface. A "plurality" of sequences refers to the use on any- 
one solid support of multiple different defined oligonucleotide/polynucleotides from a 
single EST from a selected library, as well as multiple different defined 
oligonucleotide/polynucleotides from different ESTs from the same library or many 

35 libraries from the same or different tissues, and may also include multiple identical 
copies of defined oligonucleotide/polynucleotides. Ultimately a pluarality has at least 
one oligonucleotide/polynucleotide per expressed gene in the entire organism For 
example, from a library producing about 5,000-10,000 ESTs, a single support can 
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include at least about 1-20 defined oligonucleotide/polynucleotides representing every* 
EST-in that, library ;s v The composition of defined oligonucleotide/polynucleotides^ 
which *make ; up a surface according to this invention may be selected or designed as 
desired. k:i yi'nuui^-.ny s-.M-^rr i ' I..;*.'.! f-' ;<A\v/- u>c\ r 

5 r c J : > ' - l i ■ , : The term "sanq>le" is employed in ithe description of this invention in 4 
several important ways, As used herein, the term '/sample" encompasses any cell or 
tissue from an: organism- Any desired cell or tissue :type in any desired state may be 
selected to form a sample. For example, the sample cell desired :may be a human T 
cell; the desired cell type for Use in this , invention ;may be a quiescent T cell or an 

10 / activated Tcelti ^ 

s • ! By the phrase "analogous sample" or "analogous ; cell or tissue" is y 
meant that according to this invention when the ESTs which- premde, the defined 
oligonucleotide/polynucleotides are, produced from a cDNA library prepared from a 
single tissue or, cell type source sample, e.g„ liver tissue of a human, then the samples v 

15 \ used to hybridize to those immobilized defined oligonucleotide/polynucleotides, are . 
preferably provided by the same type of sample from either a healthy or diseased 
animal, i.e., liver tissue of a healthy human and liver tissue of a diseased or infected 
human or from a human suspected of having that disease or infection. Alternatively, 
if the surface contains defined pligonucleotide^olynucleptides from multiple cells or „ 

20 tissues, then the "samples" which are hybridized thereto can be but are not limited to 
samples obtained from analogous multiple tissues or cells. 

By the term "detectably hybridizing" means that the sample from the 
healthy organism or diseased or infected organism is contacted with the defined 
oligonucleotide/polynucleotides on the surface for sufficient time to permit the 

25 formation of patterns of hybridization on the surfaces caused by hybridization 
between certain polynucleotide sequences in the samples with the certain immobilized 
defined oligonucleotide/polynucleotides. These patterns are made detectable by the 
use of available conventional techniques, such as fluorescent labelling of the samples. 
Preferably hybridization takes place under stringent conditions, e.g., revealing 

30 homologies of about 95%. However, if desired, other less stringent conditions may 
be selected. Techniques and conditions for hybridization at selected stringencies are 
well known in the art [see, e.g., Sambrook et al, Molecular Cloning. A Laboratory 
Manual.. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989)]. 

35 //. Compositions of The Invention 

The present invention is based upon the use of ESTs from any desired 
cell or tissue in known technologies for oligonucleotide/polynucleotide hybridization. 
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h y±«\ri£--Aj^:?y*S\\ An EST, as defined above, is for an animal v a sequence from a 
cDNA clone that corresponds to an tnRNA/ The EST sequences useful in the present f 
invention toe isolated preferably from cDNA^ libraries using £ rapid screening and 
5 ' sequencing technique. Giistom made cDNA libraries are made using known 
techniques. See v generally, Sambrook et al, cited above. Briefly, mRNA from a 
selected cell or tissue is reverse transcribed into con^)lementary JDN A (cDNA) using 
the reverse transcriptase enzyme and madei^ 

with DNA polymerase or reverse transcriptase. Restriction enzyme sites are added to 

10' the cDNA and it is cloned into a vector. The result is a cDNA library . Alternatively, 
commercially available cDNA libraries may be used. Libraries of cDNA can also be 
generated from recombinant expression of genomic DNA using known techniques,' 
including polymerase chain i^ction^erived t^hniquesr « ^ : : .-v-< 
i': ■[ u ESTs (which* can range from about 150 to about 500 nucleotides in 

15 length, preferably about 300 nucleotides)' can be obtained through sequence analysis 
from either end of the cDNA insert Desirably, the DNA Hbraries used to obtain 
ESTs use directional cloning methods so that either the 5' end of the cDN A (likely to 
contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be 
selectively obtained. 1 ' ; **.- 

20 : In general, the method for obtaining ESTs comprises applying 

conventional automated DNA sequencing technology to screen clones, 
advantageously randomly selected clones, from a cDNA library. The cDNA libraries' 
from the desired tissue can be preprocessed, or edited, by conventional techniques to 
reduce repeated sequencing of high and intermediate abundance clones and to 

25 maximize the chances of finding rare messages from specific cell populations. 
Preferably, preprocessing includes the use of defined composition prescreening 
probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, 
actins, myelin basic polypeptides, or any other known high abundance peptide. These 
prescreening probes used for preprocessing are generally derived from known ESTs. 

30 Other useful preprocessing techniques include subtraction hybridization, which 
preferentially reduces die population of highly represented sequences in the library 
[e.g., see Fargnoli et al, Anal. Biochem., 122:364 (1990)] and normalization, which 
results in all sequences being represented in approximately equal proportions in the 
library [Patanjali et al, Proc. Natl. Acad. Sci. USA . £8:1943 (1991)]. Additional 

35 prescreening/differential screening approaches are known to those skilled in the art. 

ESTs can then be generated from partial DNA sequencing of the 
selected clones. The ESTs useful in the present invention are preferably generated 
using low redundancy of sequencing, typically a single sequencing reaction. While 
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single sequencing reactions may; have an ; accuracy, as low as 90%, this nevertheless 
provides sufficient fidelity for . identification of the .sequence and ^design, of jPGR 

•primers. Uioi^Y L r V..^ Hi;,;*./.;,, vW*> Oi'- ^bc.;^* 

Ki* -V i If desired, - the location of an EST in a full length cDNA is determined 

5 by analyzing the EST for the presence of coding sequence. A conventional computer 
program is used to predict the extent and orientation of the coding .region of a 
sequence (using all six reading frames). Based on this information, it is possible to 
infer the presence of start or stop codons within a sequence and whether the sequence 
is completely coding - or completdy Jion-<xxiing or a combination of the two. If start 
10 or stop codons are present, then the EST can cover both part of the 5-untranslated or 
3-untranslated part of the mRN A (respectively) as well as part ( of the coding 
sequence. If no coding sequence is present, it is likely that the EST is derived firom 
4he 3' untranslated sequence due to its-longer length and the fact that most cDNA 
library construction methods are biased toward the 3' end of the mRNA. It should be 
15 .understood that both coding and non-coding regions may provide ESTs equally useful 
in the described invention. - u .. v \ A M ^ 

A number of specific ESTs suitable -for. use in the present 
invention are described above Adams et al (supral . which may be incorporated by 
reference herein, to describe non-essential examples of desirable ESTs. -Other ESTs 
20 exist in the art which may also, be useful in this invention, as will ESTs yet to be 
developed by these known techniques. 

; . % B. Preparing the Solid Support of the Invention 

Oligonucleotide sequences which are fragments : of defined 
sequence are derived from each EST by conventional means, e.g., conventional 

25 chemical synthesis or recombinant techniques. _ : , : Each defined 
oligonucleotide/polynucleotide sequence as described above is a fragment, can be, but 
is not necessarily an anti-sense fragment, of an EST isolated from a DNA library 
prepared from a selected cell or tissue type from a selected animal. For use in the 
present invention, it is presently preferred that the defined 

30 oligonucleotide/polynucleotide sequences are 20-25mers. As described above, for 
each EST a number of such 20-25mers may be generated. The lengths may vary as 
described above as well as the composition. For example 
oligonucleotide/polynucleotides can be modified based on the Oligo 4.0 or simiolar 
programs to predict hybridization potential or to include modifieid nucleotides for the 

35 reasons given above. It is alos appreciated that large DNA segments may be 
employed including entire ESTs or even full length genes particular when inserted 
into cloning vectors. 
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: ;. n . .-i , m. u ^ A 1 plurality; of these defined • oHgoniideotide/polynucleotide 
^ sequences are then attached to a selected solid support conventionally used for the 
* attachment of nucleotide sequences again by* known means, in contrast to "other 
- technologies avaUable in the^aru this support is designed to contain defined, not 
5 random, oligonucleotide/polynucleotide sequences. The iEST ftagments^or defined 
oligonucleotide/polynucleotide sequences, immobilized • on the solid sup^rt can 
include fragments of one or more ESTs from a library of at least one selected tissue 
> ' or cell sample of a healthy animal, at least one analogous' sample of the animal having 
: a disease, at least one analogous sample of the animal infected with a pathogen, and 
10 s any combination thereof. • o - ry.O ^ 1 !a <..s * » r,: 

f ^ ' ? fc ' c Numerous conventional methods are employed for attach^ 
biological molecules such as oligonucleotide/polynucleotide sequences to surfaces of 
a variety of solid supports. See, e*.; Affinity Techniques. Hnzvme Purification - Pan 
B. Methods in Enzvmolopv. Vol. 34, ed. W.B. Jakdby, M, Wilcheck, Acad. Press, 
15 ' NY fl^4): ^femoWiaed Biochemicals and Affinity (hhmmath praphv. Advances in 
Experimental Medicine and Biology, vol. 42, ed. R. Dunlap; Plenum Press, NY 
(1974); U. S. Patent No. 4,762,881; U. S. Patent No. 4,542,102; European Patent 
Publication No. 39i;608 (October 10, 1990); 'U. S. Patent No. 4,992;i27 (Nov. 21, 
1989); - ; , ■ • v . . * : - < ; ^ ••• - 

20 1 One desirable method for attaching 

oligonucleotide/polynucleotide sequences derived from ESTs to a solid support is 
described in International Application No. PCT/US90/06607 published May 30, 
1991). Briefly; this method involves forming predefined regions on a surface of a 
solidsupport, where the predefined regions are capable of immobilizing ESTsl The 

25 methods make use of binding substances attached to the surface which enable 
selective activation of the predefined regions. Upon activation, these binding 
substances become capable of binding and immobilizing 
oligonucleotide/polynucleotides based on EST or longer gene sequences. 

Any of the known solid substrates suitable for binding 

30 oligonucleotide/polynucleotides at pre-defined regions on the surface thereof for 
hybridization and methods for attaching the oligonucleotide/polynucleotides thereto 
may be employed by one of skill in the art according to this invention. Similarly, 
known conventional methods for making hybridization of the immobilized 
oligonucleotide/polynucleotides detectable, e.g., fluorescence, radioactivity, 

35 photoactivation, biotinylation, solid state circuitry, and the like may be used in this 
invention. 

Thus, by resorting to known techniques, the invention provides 
a composition suitable for use in hybridization which consists of a surface of a solid 
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support on which is immobilized at pre-defined regions on said surf ace a plurality of 
r ) defined oligonucleotide/polynucleotide \ sequences for hybridization. « Rrf example, 
; v one composition of this invention is a solid support on which are immobilized oligos 
)hl of EST. fragments from a library . constructed from a single cell type; e.g., t a human 
- 5 ^stem'celli or a single tissue, fe.g^ human Uvct^ a healthy human. ' Still another 
J composition of this inventibn is another solid support ^n which are immobilized 
oligos of EST fragments from a library constructed from a single celltype or a tissue 
; r from* human having a selected disease at predispositon to a selected disease, e.g., 
sliver cancer. ^xv-- * v-J >k nj^-_ c< rv ;J u ^<^<u- <nn, 

A 10 or f . i ; Another embodiment of ,the compositions of this invention 

^ include a single solid support having oligonucleotides of ESTstfrunr both single cell 
or single tissue libraries from both a healthy rand diseased human.; Still other 
embodiments include a single support on which are immobilized oligos of EST 
fragments from more than one tissue or cell library from a healthy human or a single 
15 support on which are immobilized more than, one tissue or cell Ubrary from both 
r healthy and diseased animals or humans. A preferred composition of this invention is 
anticipated to be a single support containing oligos of ESTs for all known cells and 
tissues from a selected organism, K . : \- } .... , v : 

A 20 III. The Methods of the Invention \ j 

A., Identification of Genes . . ^ < ^ 

The- present invention employs the compositions described 
above in methods for identifying genes which are differentially expressed in a normal 
healthy organism and an organism having a disease or infection. These methods may 
.25 ...be employed to detect such genes, regardless of the state, of knowledge about die 
; function of the gene. The method of this invention by use of the compositions 
containing multiple defined EST fragments from a single gene as described above is 
able to detect levels of expression of genes or in other cases simply the expression or 
lack thereof, which differ between normal, healthy organisms and organisms having a 
30 selected disease, disorder or infection. 

One such method employs a first surface of a solid support on 
which is immobilized at pre-defined regions thereon a plurality of defined 
. oHgonucleqtide/polynucleotide sequences, described above, of ESTor longer gene 
fragment isolated from a cDNA library prepared from at least one selected tissue or 
35 cell sample of a healthy animal (the "healthy test surface") and a second such surface 
on which is immobilized at pre-defined regions a plurality of defined 
oligonucleotide/polynucleotide sequences of ESTor longer gene fragment isolated 
from at least one analogous tissue of an animal having a selected disease (the "disease 
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.v.itest: surface"). ^ These 4est; surfaces may ( be, standardized for the selected .animal or 
;ti .selected, cellar ^tissue sample from that animal (i.e./ they are proses 
polymorphisms in the species: population). 

Y/i Polynucleotide sequences are) then isolated from mRNA and/or 
- 5 . r cDNA from a biological sample from, a known healthy animal ("healthy control' 1 ) and 
r a second sample is similarlyiprepared from a sample from 1 a known diseased animal 
t X^sease: sample"). These two samples are desirably selected from the cell or tissue 
i analogous to that which provided the immobilized oligonucleotide/polynucleotides. 
=* V /.v : - ^\ ^nV^-r.'r ! According to : the method ^ the healthy control sample is 
! 10 contacted with one set of the healthy test surfape and the disease test surface 
described above f or 3 ;time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oUgonucleotide/polynucleotides on 
, each surface. The results of this hybridization are a first hybridization pattern formed 

- between the nucleotides of healthy control and the healthy test surface and a seiebnd 
15 u hybridization pattern formed between the nucleotides of healthy control sample and 

the disease test surface. T - " ^ *; ^ - ■ - T -y 

In a similar manner, the disease sample is detectably hybridized 
to another set of healthy test and disease test surfaces, forming a third hybridization 

- pattern between the disease sample and healthy test surface and a fourth hybridization 
20 pattern between the disease sample and the disease test surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
between the healthy control: and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 
' 25 oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 
*" In another embodiment of the method of this invention, the 

same process is employed, with the exception that plurality of defined 
30 oligonucleotide/polynucleotide sequences forming the healthy test sample and the 
' disease test sample surfaces are immobilized on a single solid support. For example, 
each fragment of an EST or longer gene fragment on the surface is isolated from at 
least two cDNA libraries prepared from a selected cell or tissue sample of a healthy 
animal and an analogous selected cell or tissue sample of an animal having a disease. 
35 According to this embodiment, the healthy control sample is 

detectably hybridized to a copy of this single solid surface, forming one hybridization 
pattern with oligonucleotide/polynucleotides associated with both the healthy and 
diseased animal. Similarly, the disease sample is detectably hybridized to a second 
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copy of this single solid surface/ forming one hybridization pattern with 
oUgonucleotide/polynucleotides associated with both the healthy and diseased animal. 
:*k> 0 .... v/. - , j;i Comparing the. two; hybridisation patterns permits detection of 
\ those defined oligonucleotide/polynucleotides - which r are differentially expressed 
* 5 between lthe4iealthy control and the disease sample by ithe presence of differences in 
: the ( ; V v hybridization v \ patterns ati .pre-defined ? ^regions. ^ ; The 
oligonucleotide/polynucleotides on each . surface wMch f conespond to the pa^ 
. differences mayibe readily identified with the corresponding ESTor longer gene 
fragment from which the oligonudeotide/polynucleotides are obtained. ;, jc ,^ t 
dO , , /The identification of one or nx>re ESTs as the source of the 

• defined oligonucleotide/polynucleotide which produced ;a, "difference" ; ; in 
hybridization patterns according to these methods permits ready identification of the 
gene from which those ESTs were derived. Pecause oligonuleotides are of sufficient 
length that they will hybridize under strnigent concfitiqiK only with a RNA/cDNA for 
15 that gene to which they correspond, the oligo can be used to, identify the EST,and in 
turn the clone; from which it was derived and by subsequent cloning, obtain the 
sequence of the full-length cDNA and its genomic OTunterparts, i.e., the gene, from 
which it was obtained. ]; 

? ( In other words, the ESTs identified by the method of this 
20 invention can be employed to determine the complete sequence of the mRNA, in the 
form of transcribed cDNA, by using the EST as a probe to identify a cDN A clone 
corresponding to a full-length transcript, followed by sequencing of that clone. The 
. EST or the full length cDNA clone can also be used as a probe to identify a genomic 
clone or clones that contain the complete gene .including regulatory and promoter 
25 regions, exons, and introns. 

. . , , :i It should be appreciated that: one does not have to be restricted 
in using ESTs from a particular tissue from which probe RNA or cDNA is obtained, 
rather any or all ESTs (known or unknown) may be placed on the support 
Hybridization will be used a form diagnostic patterns or to identifiy which particular 
30 EST is detected. For example, all known ESTs from an organism are used to produce 
a "master" solid support to which control sample and disease samples are alternately 
hybridized. One then detects a pattern of hybridization associated with the particular 
disaease state which then forms the basis of a diagnostic test or the isolation of 
disease specific ESTs from which the intact gene may be cloned and sequenced 
35 leading ultimately to a defined therapuetic target. 

Methods for obtaining complete gene sequences from ESTs are 
well-known to those of skill in the art. See, generally, Sambrook et al, cited above. 
Briefly, one suitable method involves purifying the DNA from the clone that was 
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sequenced to givethe EST and labeling the isolated^ insert DNA. Suitable labeling 
systems are well known to those of skill in the art [see, eg. Basic Methods in 
Molecular Biology; &/ G: ^ Elsevier Press; NY (1986)]I ( The iabeled 1 

EST insert is then used as a probe to screen 'a lambda phage cDNA library or- a 
5 plasmid cDN A library; identifying colonies amtaining clones related to the probe 
cDNA which can be purified by known methods. The ends of the newly purified 
clones are then sequenced to identify full length sequences and complete sequencing 
of full length clones is performed by enzymatic digestion or primer walking; A 
similar screening and clone selection approach can be applied to clones from a 

10 ' genomic DNA library: 

• 'Additionally;' an EST or gene identified by this method as 

associated with inherited disorders can be used to determine at what stage during 
embryonic development the selected gene from wWch it is derived is developed by 
screening embryonic DNA libraries from various stages of development, e.g. 2-cell, 

15 8-cdlVete:, for the selected gene? : As has been mention^ abewe, the 

be applied in addtional temporal modes for monitoring the progression of a disease 
state, the efficacy of a particular treatment modality or the aging 1 process of an 
individual/ 

Thus, the methods of this invention permit the identification/ 
20 isolation and sequencing of a gene which is differentially expressed in a selected 
disease/infection. As described in more detail below, the identified gene may then be 
employed to obtain any protein encoded thereby, or may be employed as a target for 
diagnostic methods or therapeutic approaches to the treatment of the disease/ 
including, e.g., drug development 
25 The same methods as described above for the identification of 

genes, including genes of unknown function, which are differentially expressed in a 
disease state, may also be employed to identify other genes of interest. For example, 
another embodiment of this invention includes a method for identifying a gene of a 
pathogen which is expressed in a biological sample of an animal infected with that 
30 pathogen or the gene of the host which is altered in its expression as a result of the 
infection. 

One such method employs a healthy test surface as described 
above, employing defined oligonucleotide/polynucleotides from a sample of a 
healthy, uninfected animal. The second such surface has immobilized at pre-defined 
35 regions thereon a plurality of defined oligonucleotide/polynucleotide sequences of 
ESTs isolated from at least one analogous tissue or cell sample of an infected animal 
(the "infection test surface"). Polynucleotide sequences are isolated from a biological 
sample from a healthy animal ("healthy control") and a second sample is similarly 
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prepared from:an animal infected selected pathogen, ("infection sample");, 

These two samples are desirably selected from the cell or tissue analogous ; to that s 
which provided jthe ^ immobilized oligonucleotide/polynucleotides. \t would also be. 
possible to provide samples from the nucleic acidof the pathogen it$el£ , 
5 1 According to the method the healthy ; control sample is 

contacted with one set of the healthy test surface and the infection test surface 
described above for a time sufficient to permit detectable hybridization to ; occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 

10 between the nucleotides, of healthy control and the healthy test surface and a second, 
hybridization pattern formed between .the nucleotides of healthy control sample and, 
the infection test surface. • . t ^... ; ... -. ■ 

ri 3 I . . • In a similar .manner, the infection sample is detectably 

hybridized to another set of healthy test and infection test surfaces, forming a third 

15 hybridization pattern between the infection sample and healthy test surface and a 
fourth hybridization pattern between the infection sample and the infection test 
surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 

20 between the healthy animal and the animal infected with the pathogen by the presence 
of differences in the hybridization patterns at pre-defined regions. As mentioned 
differential expression is not required and simple qualitative analysis is possible by 
reference to gene expression which is simply present or absent 

A second embodiment of this method parallels the second 

25 embodiment of the method as applied to disease above, i.e., the same process is 
employed, with the exception that plurality of defined oligonucleotide/polynucleotide 
sequences forming the healthy test sample surface and the infection test sample 
surface are immobilized on a single solid support. The resulting first hybridization 
pattern (healthy control sample with healthy/infection test sample) and second 

30 hybridization pattern (infection sample with healthy/infection test sample) permits 
detection of those defined oligonucleotide/polynucleotides which are differentially 
expressed between the healthy control and the infection sample by the presence of 
differences in the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 

35 differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotides are obtained. 

As described above for the methods for identifying differential 
gene expression between diseased and healthy animals, the 
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c^gonuclcotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotide sequences are obtained and genes expressed by the 
pathogen identified for* similar purposes. Other embodiments of these methods may 
5 be developed with resort to the teaching h&rein, by altering the samples which prdvide 
die defined oligonucleotide/polynucleotides. For example/an EST, identified with a 
differentially expressed gene by the method of this invention is also useful In 
detecting ' grates expressed in the various stages of an pathogen's development, 
particularly the infective stage and following the cours of drug treatment : and 

10 emergence of resistant variants. For example, employing "the techniques described 
above, the EST can be used for detecting a gene in various stages of the parasitic 
Plasmodium species lif e cycle, which include blodd stages, Hver stages^ and 
gametocyte stages. *■ " * : '* - -*• ' = ; -hH^i- ;.• i\> f . 
" B: r Diagnostic Methods y " '< rv 1 • Vi 

Is ? ^ In addition to use of the methods' m<l 'comik«ti6ns ^this 

invention for identifying differentially expressed genes, another embodiment of this 
invention provides diagnostic methods for diagnosing a selected disease state; or a 
selected state resulting from aging, exposure to drugs or infection in an animal. 
According to this aspect of the invention, a first surface, described as the healthy test 

20 surface above; and a second surface, described as the disease test surface or infection 
test surface, are prepared depending on the disease or infection to be diagnosed. The 
same processes 1 of detectable hybridization to a first and second set of these surfaces 
with the healthy control sample and disease/infection sample are followed to provide 
the four above-described hybridization patterns, i.e., healthy control sample with 

25 healthy test surface; healthy control sample with disease/infection test surface; 
disease/infection sample with healthy test surface; and disease/infection sample with 
disease/infection test surface. 

The diagnosis of disease or infection is provided by comparing 
the four hybridization patterns. Substantial differences between the first and third 

30 hybridization patterns, respectively, and the second and fourth hybridization patterns, 
respectively, indicate the presence of the selected disease or infection in said animal. 
Substantial similarities in the first and third hybridization patterns and second and 
fourth hybridization patterns indicates the absence of disease or infection. 

A similar embodiment utilizes the single surface bearing both 

35 the healthy test surface defined oligonucleotide/polynucleotides and the 
disease/infection test surface defined oligonucleotide/polynucleotides as described 
above. Parallel process steps as described above for detection of genes differentially 
expressed in disease and infected states are followed, resulting in a first hybridization 
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pattern (healthy control sample with single healthy and disease/infection test sample) 
and a second 1 hybridization pattern (disease/infection sample with another of the 
single heathy * iir.- ^ --leut 

u . . : > : i u \; Diagnosis is accomphshed by con^ 
5 pattems/wherein substantial differences between the first and second hybridization 
patterns indicate the presence of the selected disease or infection in the animal being 
tested. Substantially similar first and second 1 hybridization patterns indicate the 
absence of disease or infection. This like many of the foregoing embodiments may 
use known or unknown ESTs derived from many libraries. 

10 v Ci l Other Methods 

As is obvious to one of skill in the art upon reading this 
disclosure, the compositions and methods of this invention may also be used for other 
similar purposes;' For example/ the general methods and compositions may be 
adapted easily by manipulation of the samples selected to provide the standardized 

15 defined oligonucleotide/polynucleotides, and selection of the * samples selected for 
hybridization thereto; One such modification is the use of this invention to identify 
cell markers of any type, e.g., markers of cancer cells, stem cell markers, and the like. 
Another modification involves the use of the method and compositions to generate 
hybridization patterns useful for forensic identification or an 'expression fingerprint' 

20 of genes for identification of one member of a species from another. Similarly, the 
methods of this invention may. be adapted for use in tissue matching for 
transplantation purposes as well as for molecular histology, i.e., to enable diagnosis of 
disease or disorders in pathology tissue samples such as biopsies. Still another use of 
this method is in monitoring the effects of development and aging upon the gene 

25 expression in a selected animal, by preparing - surfaces bearing 
oligonucleotide/polynucleotides prepared from samples of standardized younger 
members of the species being tested. Additionally the patient can serve as an internal 
control by virtue of having the method applied to blood samples every 5-10 years 
during his lifetime. 

30 Still another intriguing use of this method is in the area of 

monitoring the effects of drugs on gene expression, both in laboratories and during 
. clinical trials with animal, especially humans. Because the method can be readily 
adapted by altering the above parameters, it can essentially be employed to identify 
differentially expressed genes of any organism, at any stage of development, and 

35 under the influence of any factor which can affect gene expression. 
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IV. The Genes and Proteins identified - e ry - c - ^ r ::c 
: ; i - ^ ^Application cff the ©Smpoafibiis ah& methods bf ' ihis : invention ^as 
above described ' also provide other compositions, such as any isolated gene sequencie 
which'i^'^ and aii aninial 

5 having a disease or infection. Another embodiment of this invention is any isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal. Similarly an embodiment of this invention is any gene sequence identified by 
the methods descril^ herein. : ' '~ f /■'''"'--■*>'■-*'' - ■* >'-:^.hr>»^- 

These gene* sequences may 1 be ^ employ^ ^ bbhvendond methods to 

10 produce isolated proteins encoded thereby: " s To produce a proteiii' of this invention, 
the DNA sequences bf a desired' gene identified by the use of the methods }] 6f this 
invention or pibrtiohs thereof ' are inserted uito a * i^&bie V7 fii^'^on r System! 
Desirably, a recombiiiant molecule- or vector is coiiktra in which the 
polynucleotide sequence encoding the protein is operably linked to a heterologous 

15 expression control ^uence pamttihg expression 6f the Human protein! Numerous * 
types of appropriate expression vectors arid Host bell systems are known in the art for 
m a mm a l ian (including human) expression, insect, e.g., baculovirus expression, 1 yeast, 
fungal; and bacterial expression, by standard molecular biology techniques. 

The transfection of these vectors into appropriate host cells, whether 

20 m a mm a l ian, bacterial, fungal, or insect, or into appropriate viruses, can result in' 
expression of the selected proteins. Suitable host cells or cell lines for transfection, 
and viruses, as well as methods for the construction and transfection of such host cells 
and viruses are well-known. Suitable methods for transfection, culture, amplification, 
screening, and product production and purification are also known in the art 

25 The genes and proteins identified by this invention can be employed, if 

desired in diagnostic compositions useful for the diagnosis of a disease or infection 
using conventional diagnostic assays. For example, a diagnostic reagent can be 
developed which detectably targets a gene sequence or protein of this invention in a 
biological sample of an animal. Such a reagent may be a complementary nucleotide 

30 sequence, an antibody (monoclonal, recombinant or polyclonal), or a chemically 
derived agonist or antagonist. Alternatively, the proteins and polynucleotide 
sequences of this invention, fragments of same, or complementary sequences thereto, 
may themselves be useful as diagnostic reagents for diagnosing disease states with 
which the ESTs of the invention are associated. These reagents may optionally be 

35 labelled using diagnostic labels, such as radioactive labels, colorimetric enzyme label 
systems and the like conventionally used in diagnostic or therapeutic methods, e.g, 
Northern and Western blotting, antigen-antibody binding and the like. The selection 
of the appropriate assay format and label system is within the skill of the art and may 
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readily be chosen without requiring additional explanation by resort to the wealth of 
art irithediagnostic area; iAv',*?x*w. ; i,- f vst- r Tfnr . v ;„ 

i.i > u. Additionally, genes and ; proteins identified according to thi$ inyenticm 

i . ..... 

may be used therapeutically. For example, the ES sequences may 

5 be useful in gene therapy, to provide a gene sequence which in a disease is not 
properly or sufficiently expressed. In such a method, a selected, gene sequence of this 
invention is introduced into a suitable vector or other deli very system for delivery to a 
cell containing a defect in; the selected gene;.. Suitable, delivery systems are well 
known to; those of skill in the art and enable, , the : desired tEST or gene to be 

10 incorporated into the target cell and to be translated by; the cell.;, The EST or gene 
sequence may be introduced to mutate the existing gene by recombination or provide 
an active copy thereof in addition to the inactive^gene to replace its function. ; 

w i ; . Alternatively, a protein encoded by an EST or gene of the invention 
may be useful as a therapeutic reagent for delivery of a biologically active protein, 

IS particularly when the disease state is associated with a deficiency of , this protein. 
Such a protein may be incorporated into an appropriate therapeutic formulation, alone 
or in combination with other active ingredients., Methods of formulating such 
therapeutic compositions* as well as suitable pharmaceutical earners, and the like, are 
well known to those of skill in the art. Still an additional method of delivering the 

20 missing protein encoded by an EST, or the gene from which a selected EST was 
derived, involves expressing it directly in vivo. Systems for such in vivo expression 
are well known in the art 

-\ Yet another use of the ESTs, genes identified according to the methods 

of this invention, or the proteins encoded thereby is a target for the screening and 

25 development of natural or synthetic chemical compounds which have utility as 
therapeutic drugs for the treatment of disease states associated with the identified 
genes and ESTs derived therefrom. As one example, a compound capable of binding 
to such a protein encoded by such a gene and either preventing or enhancing its 
biological activity may be a useful drug component for the treatment or prevention of 

30 such disease states. 

Conventional assays and techniques may be used for the screening and 
development of such drugs. As one example, a method for identifying compounds 
which specifically bind to or inhibit or activate proteins encoded by these gene 
sequences can include simply the steps of contacting a selected protein or gene 

35 product, with a test compound to permit binding of the test compound to the protein; 
and determining the amount of test compound, if any, which is bound to the protein. 
Such a method may involve the incubation of the test compound and the protein 
immobilized on a solid support. Still other conventional methods of drug screening 
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can involve employing a suitable computer program to- oeternriW-^ 
similar or complementary icherm^'strucmres- to that'df the'gerie' product or portions 
thereof and screening those 'corapouhdsdther f^^binpe^ve^bln^g^w^p^^ 
to detect enhanced or decreased activity in me presence of die selected con^utta!" s< ll 
t i . ; , Thus, through use of such methodV the present ii^ anticipated 
to provideiebmpcwnds :capable :of intenK^«^-'tfae»-'gene^ ESTs^ or encoded 
proteins, or fragments' thereof; ^and either eiihancing or "decreasing the biological 
activity, lis, desired. Such compounds are believed to be encompassed by this 

invention, i ; 0-..-C 'V. al' i . = . • •. -i, ;.. S-.\ ; ;i ;.'■>-:.... w 5 ;: ■ 

r- . , « . : Numerous modifications and variations of the present invention are 
included in the above-identified specification and are expected to be obvious to one of 
skill in the art Such modifications and alteratidns to the compositions and processes 
of the present invention are believed to be encompassed in the scope of the claims 
appended hereto. . •-• > ■ ■■• 
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WHAT IS CLAIMED IS:. :* .^,., ( dtffrioH:'..'»)\ '^l >^- 

1. A method for identifying genes which are differentially expressed in 
two different pre-determined states of an organism comprising: 
5 ' 5 a. providing a first surf ace on which is immobilized at pre-defined 

regions oh said surface a plurality ' of defined toligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene^ isolated from a DNA library , 
prepared from at least one 'selected cell/iissne, organ nr nrgflnism samp le *n a first 
10 state and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre-defined 
regions on said sinface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, ; 
an entire EST a fragment of a gene or>an entire gene, isolated from a DNA library 

15 prepared from at least one selected cell, tissue, organ or organism sample in a second 
state and present in excess relative to the polynucleotide to be hybridized; , ,.,«■ 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a said organism in said first 
state, said sample selected from sources analogous to the source said 

20 hybridization sufficient to form a first and second hybridization pattern on each said 
first and second surface, 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated! from a sample from said organism in said second 
state, said sample selected from sources analogous to the sources of step (c), said 

25 hybridization sufficient to form a third and fourth hybridization pattern on each said 
first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

30 f . identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

35 
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2. The method according to Qaim 1 wherein said first and second states ait 
respectively healthy and disease; pathogen uninfected and pathogen infected; first 
progression state and asecemdp^ a disease^br infection; a { first ireatmenf 

state and a second treatment state of a disease or 'infection; dir a first' developmental 
5 arid a Second developmental state. ^ : : ; : 

*v- : =3; The method according to Claim- 1 wherein said organism is a plant or an 

animal: T < • c * v = ;v- >v'...:.. .-- .iV ... 

10' > 1 : 4. , The method according to Qaim 3 wherein said aniaml is a human. 

^ ;ri 5. • A method for 1 identifying genes which are differential expressed in a 
normal healthy animal and an animal having a disease comprising: 
> .:<;-■ . a^.o ^providing a first surface on' which is immobilized at p& 
15 defined regidns oh said surface a plurality ' of defined oUgonuclMtid^polyhucleotide 
sequences; each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 
from a DNA library prepared from at least one selected cell, tissue! organ or organism 
sample in a healthy animal and present in excess relative to the polynucleotide to be 
2d hybridized; 

b: providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

25 from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample from an animal having said disease and present in excess relative to the 
polynucleotide to be hybridized; 

c. deiectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 

30 selected from sources analogous to the sources of step (a), said hybridization 
sufficient to form a first and second hybridization pattern on each said first and 
second surface, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
hybridization pattern on each said first and second surface; 
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5 : : f i <fu< o d: r ?^ra^ft*aabfy hybridizing to a set of said first and second surfaces 

• o. polynucleotide sequences isolated from a sample from an animal having said disease, 
I to: the sample of step (c), 

f ■ "s^ on each 

5 said first and second surf ace, c : ^ . % 

e. comparing at least two of the four hybridization patterns, 
- ; . J! M?r wKerein f genes differentially expressed in saidfirst Md second states "are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 
. « ? * . ^ ;j f Idenitifying Ae qttgonucleotide/polynuc^ on each surface 

10 / which! correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment from which the oligonucleotide/polynucleotides were obtained, 
. V wher^ larger gene fragment permits identification of 

the genefrom which the ESTs or larger gene fragmem were derived. 

15 6. ; A method for identifying genes which are differentially expressed in a 

normal healthy animal and an animal having a disease comprising: : 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
20 an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
be hybridized; ' ~ r \ l 

25 V; b. detectably hybridizing to a first copy of said surface 

• polynucleotide sequences isolated from a healthy animal, said sample selected from a 
cell or tissue sample analogous to the sample of step (a), said hybridization sufficient 
to form a first hybridization pattern on said surface; 

c. detectably hybridizing to a second copy of said surface 
30 polynucleotide sequences isolated from an animal having said disease, said sample 

selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d. comparing the two hybridization patterns, wherein genes 
differentially expressed in a disease state are identified by the presence of differences 

35 in the hybridization patterns at pre-defined regions; 
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: ; which isexpressed in a 

biological sample of an animal infected with said pathogen comprising: 

a. providing a first surface on which is immnhiiir^ at pre- 
defined regions on said surface a plurality of defined ohgonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a 
healthy, uninfected animal and present in excess relative to the polynucleotide to be 
hybridized; 

*5 b. providing a second surface on which is immobilized at pre- 

defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from at least one 
selected cell, tissue, organ or organism sample of an infected animal; 

20 c. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form first and second hybridization patterns on each said 
first and second surface, 

25 d. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form third and fourth hybridization patterns on each said 
first and second surface, 

30 e. comparing the four hybridization patterns, wherein genes of 

said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

f . identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 

35 the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 
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8. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 

5 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or * 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
10 be hybridized 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

15 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

<L comparing the two hybridization patterns, wherein genes of 

20 said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

e. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 

25 permits identification of the gene from which the ESTs were derived. 



9. A composition suitable for use in hybridization comprising a solid 
surface on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization, each sequence 

30 selected from the group consisting of a fragment of an EST, an entire EST a fragment . 
of a gene or an entire gene isolated from a DNA library prepared from the group 
selected from at least one selected cell, tissue, organ or organism sample of a healthy 
animal, at least one analogous sample of said animal having a disease, at least one 
analogous sample of said animal infected with a microbial pathogen, and any 

35 combination thereof. 
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10. An isolated gene sequence which is differentially expressed in a 
normal healthy animal and an animal having a disease, identified by the method of 
claim 1. 

5 11. An isolated pathogen gene sequence which is expressed in tissue or 

cell samples of an infected animal identified by the method of claim 7. 

12. A diagnostic composition useful for the diagnosis of a disease 
comprising a reagent capable of detectably targeting a gene sequence of claim 10 in a 

10 biological sample of an animal. 

13. A diagnostic composition useful for the diagnosis of infection by a 
pathogen comprising a reagent capable of detectably targeting a gene sequence of 
claim 11 in a biological sample of an animal. 

15 

14. An isolated protein produced by expression of a gene sequence of 
claim 10. 

15. An isolated pathogen protein produced by expression of a gene 
20 sequence of claim 1 1. 

16. A therapeutic composition comprising a protein or fragment thereof 
selected from the group consisting of a protein of claim 10 and a protein of claim IS. 

25 17. A method for diagnosing a selected disease or infection in an animal 

comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

30 an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a healthy 
animal and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 

35 sequences, each sequence comprising a fragment of an EST isolated from at least one 
said tissue of an animal having said disease; 
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c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from a 
healthy animal, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
5 hybridization pattern on each said first and second surface; 

<L detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (c), said hybridization sufficient to form a third and 
10 fourth hybridization pattern on each said first and second surface; 

e. comparing the four hybridization patterns, wherein substantial 
differences between the first and third hybridization patterns and the second and 
fourth hybridization patterns indicates the presence of said selected disease or 
infection in said animal, and substantial similarities in said first and third 
15 hybridization patterns and second and fourth hybridization patterns indicates the 
absence of disease or infection. 

18. A method for diagnosing a selected disease or infection in an animal 
comprising: 

20 a. providing a surface on which is immobilized at pre-defined 

legions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence comprising a fragment of an EST isolated from a DNA 
library prepared from the group consisting of a selected cell or tissue sample of a 
healthy animal and an analogous selected cell or tissue sample of an animal having 

25 said disease; 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

30 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (a), said hybridization sufficient to form a second 
hybridization pattern on said surface; 

35 d. comparing the two hybridization patterns, wherein substantial 

differences between the first and second hybridization patterns indicates the presence 
of said selected disease or infection in said animal, and substantial similarities in said 
first and second hybridization patterns indicates the absence of disease or infection. 
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a known volume of a reagent at each of a selected tarray position, by tapping a capillary dispenser on the support under conditions effective 
to draw a defined volume of liquid onto the support The apparatus is designed to produce a microarray of such regions in an automated 
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Field of the invention 



This invention relates to a method and apparatus 
for fabricating microarrays of biological samples for 
large scale screening assays, such as arrays of DNA 
samples to be used in DNA hybridization assays for 
genetic research and diagnostic applications. 
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5, , ; ; Southern, ;et (al . /> \Genomics il 3^1008-10 17 : ( 1 99 21) . * 

Backgro und of the Invention f : ,vouiov.; 

v :< A variety : of ^methods* are currently -available for 
making arrays . of s biological ^ macromolecules., such ^as * 

10. arrays of nucleic acid molecules cor proteins . u>orie y 
method for making or dered * array s of DNA on a porous L" 
membrane is .„a »!dot blot" rapproa^. v ln^jtois; , method f a 
vacuum manifold transfers ra .plurality, e.g.. 96/, > ; rht 
aqueous samples .of DNA from 3 millimeter, diameter! wells 

15 to a porous membrane. A common variant of this ii 

procedure is a "slot-blot" method in vhidh the wells o 
have highly-elongated oval shapes* ~i 

The DNA is immobilized on the porous membrane by 
baking the membrane pr exposing it to JJV. radiation. V 

20 This is a manual procedure practical for making one 
array :at a time and usually limited to 96 samples per 
array . : "Dot-rblot procedures are f therefore = inadequate 
for applications in which many, thousand samples must be 
determined.,.; \, u ■ ^ -;.v^- 

25 A more efficients technique employed -for- making 

ordered, .arrays of genomic f ragments uses an array of . 
pins dipped into, the wells, e.g., the 96 wells of a 
microtitre plate, for transferring an array of samples 
to a substrate, such as a porous membrane. One array 

30 includes pins that are designed to spot a membrane in a 
staggered fashion, for creating an array of 9216 spots 
in a 22 x 22 cm area (Lehrach, et al., 1990). A ...... 

limitation with this approach is that the volume of DNA 
spotted in each pixel of each array is highly variable. 
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In t addition, the ■> number, of .-.arrays-, that *can be made with 
each;: dipping -is usually ^ite small:.. t ^>< Iohk ? 
a i An alternate ;i .method x>f ; f creating ordered ,;arrays of 
nucleic acid sequences is described by Pirrung, et ^2 • 
5 (1992;),, and also by * Fodpr,, et .4! .*■..■( 1991)! ♦ i ,The method H 
involves synthesizing .different nucleic acid sequences 
at different discrete xegions Qf ,a ^upport.u This r . tu 
method employs elaborate synthetic schemes, iand is . to 
generally limited ^tp; relatively short -pupleic acid 

10 sample/ e.g. , less than 2D bases. A related method has 
been described by Southern, et al. 0 (l$?2)v. ;i f 
? o :- v Khrapko, ;i et al- (199J) describes ^ ^a r me1^od i: pf: 
making an oligonucleotide matrix by spotting onto a 
thin layer ;of >polyacry^ is vdone/ 1 v 

15 manually with a micropipette* ; . - - ^ 

^ None of the methods or devices .described in the 
prior- art are designed for mass fabrication o'f 
microarrays characterized . by (i) a large number of 
micro-sized assay regions separated by a distance of 

20 50-200 microns or less, and (ii) a veil-defined, amount, 
typically in the picomole range, of analyte associated 
with each region of the array. , ;< t 

Furthermore, current technology is directed at 
performing such assays one at a time to a single array 

25 of DNA molecules. For example, the. most common method 
for performing DNA hybridizations to arrays spotted 
onto porous membrane involves sealing the membrane in a 
plastic bag (Maniatas, et al., 1989) or a rotating 
glass cylinder (Robbins Scientific) with the labe led 

30 hybridization probe inside the sealed chamber. For 
arrays made on non-porous surfaces, such as a 
microscope slide, each array is incubated with the 
labeled hybridization probe sealed under a coverslip. 
These techniques require a separate sealed chamber for 



WO95/M505 



PCt/US95/07659 



4 

each array » which makes <th€i ^screening and handling >Of 
many tsuch -arrays ' inconvenient arid time diiterisive . i 
t>vi>i \Atoouzied, cetcal.' (1994) describes a ^method of 
printing horizontal "lines of antibodies Von a * * ,J 
5 nitrocellulose membrane and separating regions of the ' 1 
membrane^ with vertical stripes of ^a hydrophobic - 1 
material. a Bach vert i^ a 
different ^antigen >and the reaction between s the ; < 
immobilized aritibody and an antigen- is detected using a 
10 standard ELISA colorimetric technique v Abouzied ' -e* v 
technique ^makes it possible -to screen many one--'-- ' 
dimensional arrays simultaneously on a single sheet- of 
nitrocellulose •.. Abouz ied makes : the nitrocellulose ^ 
somewhat hydrophobic using a line drawn -with 'PAP Pen ^ 
15 (Research Products International) ... However' Abouzied 
does ,not describe a technology that is capable of 
completely sealing the pores of the nitrocellulose. The 
pores of the nitrocellulose are: still physically open 
and so the assay reagents can leak through the 
20 hydrophobic barrier during extended high temperature 
incubations or in the presence of detergents which 
makes the Abouzied technique unacceptable for DNA 
hybridization assays; ; . 

, - Forous membranes with printed patterns of 
25 hydrophilic/hydrophobic regions exist for applications 
such as ordered arrays of bacteria colonies. QA Life 
Sciences (San Diego CA) makes such a membrane with a 
grid pattern printed on it. ■ However, this membrane has 
the same disadvantage as the Abouzied technique since 
30 reagents can still flow between the gridded arrays ^ 
making them unusable for separate DNA hybridization » 
assays. 

Pall Corporation make a 96-well plate with a 
porous filter heat sealed to the bottom of the plate. 
35 These plates are capable of containing different 
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reagents xm each >well without . cross^contamination . ; ; < 
However > f each; well is intended to hold only orie- target 
elementi whereas the. invention described , her e makes .»;!;.*■ 
microarray of jnany biomolecules in ^each subdivided 
5: : region-of i the^solijd; support. Furthermore, the 96 well 
plates, are a^ least Mo cm. thick; and prevent, the use of 4 . 5 >,. 
thec device v f or .many, colorimetric, f luorescent \-;and. *- 
radioactive detection formats which* require othat.the. 
membrane lie>- f lat against ±he detection * surface . ■ * ,The 

10 invention described -here requires no further processing 
after the assay step since the barriers elements are 
shallow and do not interf ere with the detection step v 
thereby greatly increasing convenience. , .. 

Hyseq Corporation has. described a method of making 

15 an "array of -arrays": on a non-porous solid support for 
use with their sequencing by hybridization technique. 
The method described by Hyseq involves modifying the 
chemistry of the solid support material to form a 
hydrophobic grid pattern where each subdivided region 

20 contains a microarray * of biomolecules. Hyseq 's f lat; ' - 
hydrophobic pattern does not make use of physical 
blocking as an additional means of preventing cross 
contamination. 

25 BummarY the Syreptjon 

The invention includes, in one aspect, a method of 
forming a microarray of analyte-assay regions on a 
solid support, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

30 The method involves first loading a solution of a 
selected analyte-specif ic reagent in a reagent- 
dispensing device having an elongate capillary channel 
(i) formed by spaced-apart, coextensive elongate 
members, (ii) adapted to hold a quantity of the reagent 

35 solution and (iii) having a tip region at which aqueous 
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solution in the channel forms a meniscus. The channel 
Is preferably formed by a- pair of spaced-apart tapered 

t element&. (1 < j^oAvt:.* :;;V..-cW r--K-' v-.. .r-**^v' 
!)vi„ The tip of the dispensing device is tapped against 
5 a solid support at a defined position on the support 

surface with, an impose effective to break^the meniscus 
in the capillary channel deposit a selected volume of 
solution on the surf ace, preferably a selected volume 
in the range 0.01 to 100 nl. tl The two steps are 

10 repeated:. 'Until, the -desired array; is. formed. >KMt -, 

; ; ^ . f The method ,may be practiced in forming a plurality 
of- such arrays, where the solution-depositing step is 
are applied to a selected position, on each of a 
plurality . of solid supports at each, repeat *, cycle. . : . 

15 r: . The dispensing device may be loaded with a new 

solution, by the steps of (i) dipping the capillary 
channel of the device in a wash solution, (ii) removing 
wash solution drawn into the capillary. channel, and 
(iii) dipping the capillary channel into the new 

20 reagent solution. r ; 

Also included in the invention is an automated 
apparatus . for forming a microarray of analyte-assay 
regions on a plurality of solid supports,, where each 
region in the array has a known amount of a selected, 

25 analyte-specific reagent. The apparatus has a holder 
for holding, at known positions, a plurality of planar 
supports, and a reagent dispensing device of the type 
described above. 

The apparatus further includes positioning 

30 structure for positioning the dispensing, device at a 
selected array position with respect to a support in 
said holder, and dispensing structure for moving the 
dispensing device into tapping engagement against a 
support with a selected impulse effective to deposit a 
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selected volume on ;<the support .,-.,e.g. a (1 selected volume 
in the volume^range^Q n 01. to 100 nl. 

The positioning; «and, dispensing f sti^ctures ^?:e 
controlled by <a .control ^nit^in ;the apparatus v . The 
5 unit, operates to (i) ; place ^the, dis^ensing^.deyice at a 
.loading station,^ (ii) t .move the, capillary; channel in the 
device Into a selected reagent at the -loading station, 
to .load the dispensing device; ? with the ; reagent . c , and > 
4 iii) dispense the reagent at a defined array position 

10 on each of the supports on said holder . , : The i unitmay 
further operate, r at >the end , of , a ;disp^sing, c J9ycle r to 
wash the, dispensing device by (i) placing the ^ v 
dispensing device at a washing station fXl (ii). ; moving the 
capillary channel! in the device intp r a wash f luid, to 

15 load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 
device with a fresh selected reagent. r , • \ Viv ,^ 

The dispensing device in the apparatus may be one 
of a plurality of such devices which are carried on the 

20 arm for dispensing dif f erent = analyte assay reagents at 
selected spaced array positions. 

In another aspect , the invention includes a 
substrate with a surface having .a microarray v of at 
least 10 3 distinct polynucleotide or polypeptide 

25 biopolymers in a surface area of less than about 1 cm 2 . 
Each distinct biopolymer (i) r is disposed at a separate, 
defined position in said array, (ii) has a length of at 
least 50 subunits, and (iii) is present in a defined 
amount between about 0.1 femtomoles and 100 nanomoles. 

30 In one embodiment, the surface is glass slide 

surface coated with a polycationic polymer, such as 
poly lysine, and the biopolymers are polynucleotides. 
In another embodiment, the substrate has a water- 
impermeable backing, a water-permeable film formed on 
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; the ^ bapking , , end, > a grid s formed . on the ,f i 1m . ; ;The r gr id 
K ; is, composed of . intersecting water- impervious grid 
elements extending from said backing to positions 
} rais^d c ^w pjf said u film, and partitions 

f 5 the film intp; a,, plurality; of water-impervious cells. A 
a bippQlymeri .array is. formed within each well. , ?, /;oi \ 
or 'i j i ■ r More ; t genera 1 ly, ^ there is* prpvided a a> substrate for 
t ^e , in detecting binding- of n labeled polynucleotides to 
one or^ : more f of; ^| pliirality different-seguenqe, ( , 
10 ^ i^pbilized . polynucleotides . ; . The substrate includes, . 
y- *>P e aspect fV a ; glass support,, e ^coating t of a; 
: i P^lypationic^pplymexr ,. such as poly lysine 9 » on said r , , . , i 
. sur f ace of ; the siappo^ 

. polynucleotides electrostatically r bound t npn-covalently 

15 to said coating, where, each distinct biopplymer is 

disposed at a separate, defined position in a surf ace 
ariray -of c polynucleotides. ^, t , 

i f s In smother aspect, the substrate includes a water- 
impermeable backing , a water-permeable film formed on 

20 /the backing, and a grid formed on the film, where the 
. grid is composed of intersecting water-impervious grid 
elements extending from the backing to positions raised 
above the surf ace ? of the film, forming a plurality of 
, cells f . A biopolymer array is formed within- each cell. 

25 Alsp forming part ; of the invention, is a method of 

detecting differential expression of each of a 
plurality of genes in a first cell type, with respect 
to expression of the same genes in a second cell type. 
In practicing the method, there is first produced 

30 fluorescent-labeled cDNA's from mRNA's isolated from 
the two cells types, where the cDNA'S from the first 
and second cells are labeled with first and second 
different fluorescent reporters. 

A mixture of the labeled cDNA's from the two cell 

35 types is added to an array of polynucleotides 
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representing a .plurality u pf Known genes derived, from 
the^two; cell types, under conditions that result in 
hybf idizatipn^pf : the c^ 

pplynupleotides in ; thie array,. t T|ie f ,array ,is then ; 
5 examined, py } fluorescence .under fluorescence, excitation 
conditions v in -which (i) polynucleotides in the array 
that are hybridized predominantly, to cDNA's ; derived 
fronr qjxB: of ,the r £ irst and c second^cell types give : a 
distinct first; orrsecondr fljiprescence emission: color, 

10 respectively, ^ apd^tii) polynucleotides; in the array 

that are hybridized to substantially equal numbers of 
cDNA's derived from the first , and r second , ce 11^ types , 
give a distinct combined fluorescence, emission color, 
respectively. . The relative expression of known genes 

15 in the two cell types can then be determined by the 
observed fluorescence emission color of each spot. 

These and other objects and features of the 
invention will become more fully apparent, when the 
following detailed description of the invention is read 

20 in conjunction with the accompanying figures. 

Brief Description of the Prayings 

Fig, 1 is a side view of a reagent-dispensing 
device having a open-capillary dispensing head 
25 constructed for ixse in one embodiment of the invention; 

Figs. 2A-2C illustrate steps in the delivery of a 
fixed-volume bead on a hydrophobic surface employing 
the dispensing head from Fig. 1, in accordance with one 
embodiment of the method of the invention; 
30 Fig. 3 shows a portion of a two-dimensional array 

of analyte-assay regions constructed according to the 
method of the invention; 

Fig. 4 is a planar view showing components of an 
automated apparatus for forming arrays in accordance 
35 with the invention. 
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c . .^Eig.. ^ 20 x 

,20 array , ; of 4 op ffuores^ ; ... 

c : immobilized pn a r pqly-l-lysine... coated slide, where the 
.^total^rea covered by , the . 4 0Q element array is 16 

r 5 ^square, millimeters; , ^ tyK . , ...... : .. 

r .Fig. 6 is^a fluorescent image, of a 1. 8 cm x 1.8 cm 
.microarray i; containing laaibda clones with yeast inserts, 
the fluorescent signal arising, from , the hybridization 
v to the array with, approximately ,telf > the t yeast genome 
, 10 labeled . with : a green fluorophore an£ , the other half 
with a red fluorophore; 

Fig* 7 shows the translation of the, hybridization 
image of Fig. 6 into a karyotype of the yeast genome, 
where the elements of Fig. -6 micrparray' contain yeast 

15 DNA sequences that have been previously physically 
mapped in the yeast genome; 

Fig. 8 show a. fluorescent image of a 0.5 cm x 0.5 
cm microarray of 24 cDNA clones, where the microarray 
was hybridized simultaneously with total cDNA from wild 

20 type Arabidopsis plant labeled with a green fluorophore 
and total cDNA from a transgenic Arabidopsis plant 
labeled with a red fluorophore, and the arrow points to 
the cDNA clone representing the gene introduced into 
the transgenic Arabidopsis plant; 

25 Fig. 9 shows a .plan view. of substrate having an 

array of cells formed by barrier elements in the form 
of a grid; 

Fig. 10 shows an enlarged plan view of one of the 
cells in the substrate in Fig. 9, showing an array of 
30 polynucleotide regions in the cell; 

Fig. 11 is an enlarged sectional view of the 
substrate in Fig. 9, taken along a section line in that 
figure; and 

Fig. 12 is a scanned image of a 3 cm x 3 cm 
35 nitrocellulose solid support containing four identical 
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arrays c of M13. clones in lea.ch >■* of f v our quadrants ; < where 
each quadrant >yas, ^hybridized simultaneously ^to> a ; > 
different, oligonucleotide using an open, face.. :;] \, ; v 
hybridization -method.; . ; ; ^ - i / * v 

r ..-ii'. Detailed Description of the Invention 

i. DgginitiQng - ^ . v.,-., : . 

( Unless indicated otherwise, the terms defined 
below have the following meanings: ; ; ^ 

1Q - "Ligand". refers to .one member of a ligand/anti- 
ligand binding pair ♦ The ligand may be , for example , 
one .of; the nucleic acid strands in a complementary , 
hybridized nucleic acid duplex > binding ■ pair ; , an . ; ■ : 
effector molecule in an • effector /receptor binding pair ; 

15 or an antigen in an antigen/ antibody or 
antigen/ antibody fragment binding pair. 

"Antiligand" refers to the opposite member of a 
ligand/anti-ligand binding pair. The antiligand may be 
the other of the nucleic acid strands in a 

20 complementary, hybridized nucleic acid duplex binding 
pair; the receptor molecule in an effector /receptor 
binding pair; or an antibody or antibody fragment 
molecule in antigen/ antibody or antigen/antibody 
fragment binding pair ; respectively 

25 "Analyte" or "analyte molecule" refers to a 

molecule, typically a macromolecule, such as a 
polynucleotide or polypeptide, whose presence, amount, 
and/or identity are to be determined. The analyte is 
one member of a ligand/anti-ligand pair. 

30 "Analyte-specific assay reagent" refers to a 

molecule effective to bind specifically to an analyte 
molecule. The reagent is the opposite member of a 
ligand/anti-ligand binding pair. 

An "array of regions on a solid support" is a 

35 linear or two-dimensional array of preferably discrete 
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regions, each having a finite area, formed on the 
surface of .a .solid support. , , . . ^ 

A "mii^roarray" is an, arra^ib^i^ a 
density of discrete jregionsj of at, least about 100/cm 2 , 
5 and preferably at < least about lOOO/cm 2 . 0 The regions in 
a microarray have typical dimensions^ e.g. ] diameters, 
in the range of Between about 10-250 firn^ and ?are 
separated from other regions in the ^array "by about the 
same distance. 

10 A support surf ace is "hydrophobic" if" a aqueous- ' 

medium droplet' applied to''th£'siur¥a^ not spread 

out substantially beyond the area size of the applied 
droplet • That is , the surface acts to ' preveht 
spreading t>f the droplet applied to the surf ace by 

15 hydrophobic interaction with the dropletT 

A "meniscus" means a concave or convex surface 
that forms on the bbttom of a liquid in a channel ks a 
result of the surf ace tension of the liquid. 

"Distinct biopolymers" , as applied to the 

20 biopolymers' forming a microarray, means an array member 
which is distinct from other array members on the basis 
of a different biopolymer sequence, and/ or different ' 
concentrations of the same or distinct biopolymers, r 
and/ or different mixtures of distinct or different- 

25 concentration biopolymers. Thus an array of ^distinct 
polynucleotides" means an array containing, as its 
members, (i) distinct polynucleotides, which may have a 
defined amount in each member, (ii) different, graded 
concentrations of given-sequence polynucleotides, 

30 and/or (iii) different-composition mixtures of two or 
more distinct polynucleotides. 

"Cell type" means a cell from a given source, 
e.g., a tissue, or organ, or a cell in a given state of 
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..differentiation, .-,or.<a cell< associated: with.,a igiven? 
^pathology : ©r . genetic ^ makeup 1 . ;•'.) - .*»>•? <■>•.». ; ". 

Ox'i *■ t' .' •*:/.»■ m'S i. •!! *,„:.;< i,.":!"'- '•; , ..L i C. >•■'•' ' ■. •V.'f.i'il' (Xj 

11 ^ - Method Microarrav Formation, ; .? ^: 
5 ; ;c; ^This section ^describes ar method of forming a 

microarray-of i? analyte-assay regions on a solid, support 
or substrate, where each region in ' the * array; has > a 
-known amount, of r a selected, r * analyte-specif ic ireagent ; 1 
Fig. X illustrates, vdn a^partially.: schematic view, 
10 a reagentrdispenging device 10 useful in practicing the 
method* The device generally includes a< reagent r 
dispenser 12 having an elongate open capillary chahrifel 
14 adapted to hold a quantity of the reagent solution; 
such as indicated at 16 , as will be described below . 

15 The capillary channel is formed by a pair of spaced- 

apart, coextensive, elongate members 12a, 12b which are 
tapered toward one another and converge at a tip or tip 
region 18 at the lower end of the channel. More 
generally, the open channel is formed by at least two 

20 elongate, spaced- apart members adapted to hold a 

quantity of reagent solutions and having a tip region 
at which aqueous solution in the channel forms a 
meniscus , such as the concave meniscus illustrated at 
20 in Pig. 2 A. The advantages of the open channel 

•25 construction of the dispenser are discussed below. 

With continued reference to Fig. 1, the dispenser 
device also includes structure for moving the dispenser 
rapidly toward and away from a support surface, for 
effecting deposition of a known amount of solution in 

30 the dispenser on a support, as will be described below 
with reference to Figs. 2A-2C In the embodiment 
shown, this structure includes a solenoid 22 which is 
activatable to draw a solenoid piston 24 rapidly 
downwardly, then release the piston, e.g., under spring 

35 bias, to a normal, raised position, as shown. The 
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dispenser is carried -on . the piston by -- r a ^connecting 
member 26, as 7 shown . The just-described moving 
structure is alsb : referred to?herein^as dispensing 
"mfeans ' for moving the dispenser into engagement >with >a : 
5 solid support > for dispensing' a known volume of fluid 
; ontlie r support i 

'The dispensing device just described isrcarried on 
ah arm 28 that may be moved either linearly cor : in ran l-x- 
•y; plane to position the dispenser at a selected 
10 deposition position, as will be described. - > 

m pig S # r ,2A-2C: illustrate :the method of depositing a 
known ^amount of reagent solution in the just-described 
dispenser on the surf ace of a . solid support, such as 
the support indicated at " 3 0 . The support is a polymer , 
" 15 glass, or other solid-material support having a surf ace 
Indicated at 31*" : 

In one general embodiment, the surface is a 
relatively hydrophilic, i.e., wettable surface, such as 
a surface having native, bound or covalently attached 
20 charged groups. On such -surface described below is a 
.glass surf ace having an absorbed layer of a t; 
polycat ionic polymer, such as polypi-* lysine. * 

In another embodiment, the surf ace has or is 
formed to have a relatively hydrophobic character, 
25 i.e.,. one that causes aqueous medium deposited on the 
surf ace to bead. A variety of known hydrophobic 
polymers, such as polystyrene, polypropylene, or 
polyethylene have desired hydrophobic properties, as do 
glass and a variety of lubricant or other hydrophobic 
30 films that may be applied to the support surface. 

Initially, the dispenser is loaded with a selected 
analyte-specific reagent solution, such as by dipping 
the dispenser tip, after washing, into a solution of 
the reagent, and allowing filling by capillary flow 
35 into the dispenser channel. The dispenser is now moved 
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to-a 'selected position with respect to a ! "support u: " 
Surface, * placing the dispenser tip directly above the 
support-surface jk>siitibri^ to be 

deposited^ This 'movement -takes plade with the rc ' ' * " 
5 "dispenser tip in its raised position'/ as seen in Fisf. 
2A, where the tip is typically - at least s^veraf 1^5 mm 
above the surface ' bf the "substrate;- ri ::> " r ~ " 1 nc 
With the dispenser so positioned; sblenbid 22 is r 
now activated tb : cause the dispenser ti!p to inbve 

10 rapidly toward and away ^ rbm 1 the substr aite ' sxir f ace , 
itakirig mom In effect, 

tapping the tip of the dispenser against the support 
surface. Thfe ; tapping movement of the tip against the 
surf ace 1 acts to break the liquid meniscus in the tip' 

15 channel, bringing "'the liquid in the tip into contact 
with -the support surface. Tfils, in turn, produces a 
flowing of the liquid into the capillary space between 
the tip and the surface, acting to draw liquid out of 
the dispenser channel, as seen in Fig. 2B. 

20 Fig. 2C shows flow of fluid from the tip onto the 

support surface, which in this case is a hydrophobic 
surface. The figure illustrates that liquid continues 
to flow from the dispenser onto the support surface 
until it forms a liquid bead 32. At a given bead size, 

25 I.e., volume, the tendency of liquid to flow onto ttie 
surface will be balanced by the hydrophobic surface 
interaction of the bead with the support surface, which 
acts to limit the total bead area on the surface, and 
by the surface tension of the droplet, which tends 

30 toward a given bead curvature. At this point, a given 
bead volume will have formed, and continued contact of 
the dispenser tip with the bead, as the dispenser tip 
is being withdrawn, will have little or no effect on 
bead volume. 
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ci ,r : < -for liquid-dispensing on -a- more hydrbi>hilid r 1 
Surface; 1 the 'liquid will have less of a tendency to 
bead; and the dispensed vblume will be m<ire sensitive 
\ tbthe 1 totals-dwell ; time v bf the dispenser 'tip ih ; the 
; 5 ^Immediate:'' vicinity of the - support -surf e .g ; > the 
v positions' illustrated ' in : Figs . > 2B and 2C. 

The desired deposition volume, ile.y bead volume, 
formed by this method is preferably in the range 2 pi 
V (picolitefs) 5 td 2V nl* ( nanol iters) , although volumes as 
HO i high *as 100 nl or more may be dispensed. It will be 
appreciated that the selected dispensed volume will 
vdependv on? (i) -the "footprint? i,of the- dispenser tip, 
li .e-, „ ;the;,size of the area spanned by the tip, ( ii) the 
whydrophobicity of the .r support t surface, and (iii) the 
15 time of contact with and rate of withdrawal of the tip 
from the support surface. In addition, bead size may 
be reduced by increasing the viscosity of the medium, 
. effectively reducing the flow time of liquid from the 
dispenser onto the support surface. The drop size may 
20 be further constrained by depositing the drop in a 
hydrophilic region surrounded by a hydrophobic grid 
pattern x>n the support surface, 

In a < typical embodiment , the dispenser tip is 
tapped rapidly against the support surface; with a 
25 total residence time in contact with the support of 
less than about 1 msec, and a rate of upward travel 
^from the surf ace of about 10 cm/sec. 

Assuming that the bead that forms on contact with 
the surface is a hemispherical bead, with a diameter 
30 approximately equal to the width of the dispenser tip, 
as shown in Fig. 2C, the volume of the bead formed in 
relation to dispenser tip width (d) is given in Table 1 
below. As seen, the volume of the bead ranges between 
2 pi to 2 nl as the width size is increased from about 
35 20 to 200 /xm. 
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volume Xnll; *j 




■ 2 x i(y* ! ^^ 


50 fim 


3.1 x 10' 2 


100 jum 




1 200 joT 


2 



10 ; ^ At _ a given tip size, bead volume can be reduced in 
a controlled fashion by increasing surface 
hydrophobic^, ^reducing time, of, contact { of f the tip (1 
with the pur face, . increasing rate of movement of the 
tip away from the surface, and/or increasing the 

15 viscosity of the medium. Once these parameters are 

fixed, a selected , deposition volume in the, desired , pi 
to nl range can be achieved in a repeatable fashion. 

After depositing a bead at one selected location 
on a support, the tip is typically moved to a : 

20 corresponding position on a second .support , a droplet 
is deposited at that position, and this process is . 
repeated until a liquid droplet of the reagent has been 
deposited at a selected position on each of a plurality 
of supports. 

25 The tip is then washed to remove the reagent 

liquid, filled with another reagent liquid and this 
reagent is now deposited at each another array position 
on each of the supports. In one embodiment, the tip is 
washed and refilled by the steps of (i) dipping the 

30 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

From the foregoing, it will be appreciated that 

35 the tweezers- like, open-capillary dispenser tip 
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provides the advantages that °( i) the open channel'* lo'£- a > ; 
the tip* facilitates orapid, 'efficient washing and idrying 
before reloading ^ the *lp with-* hew reagent ; jfii) r ^ t 
passive capillary action "can load the sample directly 
5 r from* standard iiiicrdwell ; plat^ while 'retaining ^-v o 
sufficient sample ' iLri the -open f capillairy ^reserVoxr f oif- : 
the * printing of numerous 'arrays , q> ( iii) open capillaries 
are ' less prone to clogging than closed capillaries , arid 
(iv) open capillaries do nbt require ><a ^perfectly faced 

10 ; bottom surface for fluid delivery. 

A portion jof a^microarray^36 formed on the surface 
38 of a solid support *4CK in accordance .with rthe ^method 
just described is shown in Pig. :'3.<,-*.--.-Ihev:array.,±s f ormed r 
of a plurality of analyte-specif ic reagent regions , •< 

15 1 such as regions 42 , where each region may include a < * - - 
different analyte-specif ic reagent. ,As indicated 
above, the diameter of each region is preferably *r : .-. r: 
between about 20-200 ^m. The spacing between each 
region and its closest (non-diagonal) neighbor, ; :> 

20 measured f rom center-to-rcenter *( indicated at 44 ) ., is , 

pref erably in the .range »of about 20-400 jnn. Thus, for : > 
example, an array having ;a center-to^center. spacing ;of 
about 250 Mm rcontains about 40 regions /cm or 1,600 
regions/cm 2 . After formation of the array, the support 

25 is treated to evaporate the liquid of the droplet 

forming each region; to leave a desired array of dried, 
relatively flat regions. This drying may be done by 
heating or under vacuum. 

In some cases, it is desired to first rehydrate 

30 the droplets containing the analyte reagents to allow 
for more time for adsorption to the solid support. It 
is also possible to spot out the analyte reagents in a 
humid environment so that droplets do not dry until the 
arraying operation is complete. 
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Automated Apparatus for Forming -Arravs; v « : 

: In anotjier aspect, the ^invention Includes van 
automated apparatus ^for> forming an ;ain?ay of oanalyte- 
assay : regions on «a, solid support^ where each ^ region irir 
5 the c array ^has a known amount of a selected , mnalyte- m a 
specific reagent.. r.-oi.»..>v Ap**-. a. or 

: : The apparatus. is shown in planar; and partially 
schematic view -in. Fig. 4^:-. A dispenser device 72 in the 
apparatus has the. basic construction described above 

10 with respect to Fig- 1, and includes a dispenser 74 

having an open-capillary channel terminating at a "tip, 
substantially as shown in Figs. 1 and 2A-2C. 

The dispenser is mounted in the device for ' 
movement toward and away from a dispensing position at 

15 which the tip of the dispenser taps a support surface, 
to dispense a selected volume of reagent solution, as 
described above. This movement is effected by a ^ 
solenoid 76 as described above. Solenoid 76 is under 
the control of a control uriit 77 whose operation will 

20 be described below. The solenoid is also referred to 
herein as dispensing means for moving the device into 
tapping engagement with a support; when the device is 
positioned at a defined array position with respect to 
that support. ■ . 

25 The dispenser device is carried on an arm 74 which 

is threadedly mounted on a worm screw 80 driven 
(rotated) in a desired direction by a stepper motor 82 
also under the control of unit 77. At its left end in 
the figure screw 80 is carried in a sleeve 84 for 

30 rotation about the screw axis. At its other end, the 
screw is mounted to the drive shaft of the stepper 
motor, which in turn is carried on a sleeve 86. The 
dispenser device, worm screw, the two sleeves mounting 
the worm screw, and the stepper motor used in moving 

35 - the device in the "x" (horizontal) direction in the 
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figure form what is referred to here collectively as a 
displacement assembly 86. . 

. The displacement assembly is constructed to 
produce precise, micro-range movement in the direction 
5 of the screw, i.e., along an x axis in the figure. In 
one mode, the assembly functions to move the dispenser 
in x-axis increments having a selected distance in the 
range 5-25 urn. In another mode, the dispenser unit may 
be moved in precise x-axis, increments of several 

10 microns or more,; for positioning the dispenser at 

associated positions on adjacent supports, as will be 
described below. 

The displacement assembly, in turn, is mounted for 
movement in the M y" (vertical) axis of the figure, for 

15 positioning the dispenser at a selected y axis 

position. The structure mounting the assembly includes 
a fixed rod 88 mounted rigidly between a pair 'of -frame 
bars 90, 92, and a worm screw 94 mounted for rotation 
between a pair. of frame bars 96, 98. The worm screw is 

20 driven (rotated) by a stepper motor 100 which operates 
under the control of unit 77. The motor is mounted on 
bar 96, as shown. 

The structure just described, including worm screw 
94 and motor 100, is constructed to produce precise, 

25 micro-range movement in the direction of the screw, 
i.e., along an y axis in the figure. As above, the 
structure functions in one mode to move the dispenser 
in y-axis increments having a selected distance in the 
range 5-250 pm, and in a second mode, to move the 

30 dispenser in precise y-axis Increments of several 

microns (jua) or more, for positioning the dispenser at 
associated positions on adjacent supports. 

The displacement assembly and structure for moving 
this assembly in the y axis are referred to herein 

35 collectively as positioning means for positioning the 
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dispensing vdevice;atx a iseiected-array ' posit ionowith 
respect to a support. . . *"i -:.v;. >c.i> ^r-t 

; holder 102 in the apparatus functions to hold a 
plurality of supports, * such as supports s 104 on .which :> 
5; the ^micrdarxaysi of regent; regions =are. to i be formed by> 
the apparatus. - The holder provides^ a number of > 
recessed • slots r such sis slot .106 ; i which receive the 
supports, and position them at precise selected 
positions with respect 'to othet frame bars on which the 

10 dispenser ^moving means is mounted^ i< . t -#r.y. -v. 

< < > As noted above, the control unit ih the device 
functions ; to actuate the two stepper motors and' ' Vi ; 
dispenser solenoid in a sequence designed for automated 
operation of the - apparatus in forming • a - selected 1 

15 microarray of reagent regions on each of a plurality of 

SUppOrtS..\ , : - • » ^ ,\- r < 

" ^ The control unit is constructed, -according to 
conventional microprocessor control principles, to 
provide appropriate signals to each of the solenoid and 

20 each of the stepper motors, in a given timed sequence 
and for appropriate signalling time. The construction 
of the unit, and the settings that are selected by the 1 
user to achieve a desired array pattern, will be 
understood from the following description of a typical 

25 apparatus operation* 

Initially, one or more supports are placed in one 
or more slots in the holder; The dispenser is then 
moved to a position directly above a well (not shown) 
containing a solution of the first reagent to be 

30 dispensed on the support (s) • The dispenser solenoid is 
actuated now to lower the dispenser tip into this well, 
causing the capillary channel in the dispenser to fill. 
Motors 82, 100 are now actuated to position the 
dispenser at a selected array position at the first of 

35 the supports. Solenoid actuation of the dispenser is 
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then effective; to- dispense a. selected- volume droplet of 
that Reagent at thisMocatibftv As noted aboVey ri thi^ nc * 
operation' : is ?. ef f ectiveJ to^ dispense a selected volume 
preferably between : 2 pl^and 2 nl? of the 1 reagent ' * 
5 solution*^: i : i^v- jM^t'ioi- si';- p;v".; ? 'r su* " [ i 

iv i . . oThe dispenser is now moved to *he ^ corresponding 
position at an adjacent support and a similar volume J of 
the. solution is dispensed /at ithis position The > : > r 
process is repeated iintilr.the reagent has been ^ j »v 

10 dispensed at this preselected corresponding position on 
each of the supports. olA-y; -^r-^.. /C.a 

,r - Where it is desired to dispense a single reagent 
at more > than two array positions on a support ) * the ; 
dispenser may be moved to different array positions >f at 

15 each, support , bef ore moving the^ dispenser to a new 
support, or solution can be dispensed at individual 
positions on each ^support > rat, one selected position, 
then the cycle repeated, for each new ; array i position.^ 
To dispense the next reagent, the dispenser is 

20 positioned over a wash solution (not shown) v and ^the^ 
dispenser , tip is dippefd in and out of this solution " 
until , the reagent solution has been substantially r 
washed from the tip. Solution can be removed from the 
tip, after each dipping, by vacuum, compressed air 

25 spray/ sponge, or the like. u . o u 

, The dispenser tip is now dipped in a second 
reagent well, and the filled tip is moved to a second 
selected array position in the first support. The 
process of dispensing reagent at each of the 

30 corresponding second-array positions is then carried as 
above. This process is repeated until an entire 
microarray of reagent solutions on each of the supports 
has been formed. 
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n • . v - rt i This section , describes^embpdironts, of a ^substrate 
^haying, a piicroargray , of ^biological; polymers carried on 
q substrate^ surface,^ ; Subsection A, describes a mult i- 
j ce^l jsubsteatey. ea^ cell- , pf which contains a 
. 5 microarrajy, and pref erably an identical; microarr ay, of 
: distinct r ^ippplyiaere such as distinct > polynucleotides , 
r formed ^on^a porous surface,. Subsection B describes la 

micrparray; of .disjtinct pplynucleotldes: toound, onl axglass 
, slide coated with n^a t polycatipnic ; polymer,, u ;k; 

Multi— Cell Substrate H M or. uai::.- ,Hr- 
• v - >Fig. ] 9. illustrates, in plan view, : a '* substrate 110 
constructed according itovthes invention; . vf The- substrate 
,has an ;8- x 12 rectangular -array 112 of cells, such as 

15 * cells 114, 116, formed on the substrate surface. With 
reference to t Fig ; 10 , r each cell , such as- cell 114> in 
turn supports a micrbeonray 118 of distinct biopolymers, 
such. as polypeptides or polynucleotides at known, 
addressable regions of the microarray. Two such 

20 regions forming the microarray are indicated at 120, 

and correspond to regions, such as regions 42 , forming 
the microarray of distinct biopolymers shown in Fig. 3. 

The 9 6 -cell arrays shown in Fig. 9 has typically 
array dimensions between about 12 and 244 mm in width 

25 and 8 and 400 mm in length, with the cells in the array 
having width and length dimension of 1/12 and 1/8 the 
array width and length dimensions, respectively, i.e. , 
between about 1 and 20 in width and 1 and 50 mm in 
length. ^ 

30 The construction of substrate is shown cross- 

sectionally in Fig. 11, which is an enlarged sectional 
view taken along view line 124 in Fig. 9. The 
substrate includes a water- impermeable backing 126, 
such as a glass slide or rigid polymer sheet. Formed 

35 on the surface of the backing is a water-permeable film 
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128 . j The: film is formed- of a porous r membrane; material , 
such as nitrocellulose membrane, u or » av porous web - 
material^! sucir asr a inyloriy :polypropylene , ? or; *PVDF 
porous polymer material, ; Thecthickness of the film is 
5 preferably between about 10 and 1000 jinu- The film may 
toe applied 'to ;the backinglby spraying or coating- 
uncured < material *. oil the , backing , - of' by > apply ing a* . 
preformed. membrane .to the backing, ^ The backing and i; u 
film may be obtained as a preformed unit from 

10 commercial source, e.g., a plastic-backed 

nitrocellulose film available from Schleicher and 
Schue 11 Corporation. ^ v ; > ^; j u-. : >; ^ „, e. ; v; ;i -v^^; 

With continued referenced to Fig. 11/ the film-^/ 
covered surface in the substrate is partitioned, into a 

15 desired array of cells by water-impermeable grid lines, 
:such as lines 130, 132, , which have infiltrated the film 
down toi the level of the backing, and extend above the 
surf ace; .of* the film as shown, typically a distance of 
100 to 2000 Mm above the film surface. 

20 The grid lines are formed on the substrate by , v 

laying down an uncured or otherwise f lovable resin or 
elastomer solution in an array grid , , > allowing: . the 
material to infiltrate the porous film down to the 
backing, then curing or otherwise hardening the grid 

25 lines to form the cell-array substrate. 

One preferred material for the grid is a flowable 
silicone available from Loctite Corporation. The 
barrier material can be extruded through a narrow 
syringe (e.g., 22 gauge) using air pressure or t 

30 mechanical pressure. The syringe is moved relative to 
the solid support to print the barrier elements as a 
grid pattern.. The extruded bead of silicone wicks into 
the pores of the solid support and cures to form a 
shallow waterproof barrier separating the regions of 

35 the solid support. 
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In alternative embodiments, the barrier element 
can be a wax-based material or a thermoset material 
such as epoxy? the barrier material can ' also be a UV- 
curing^pblymer which light: after being 

5 printed onto the solid support! The barrier material 
may also be applied ib tiie solid support using printing 
techniques such as silk-screen printing! ' The barrier 
material may also be a heat-seal stamping of the porous 
solid support which seals its pores and forms a water- 

10 impervious' barrier element. The barrier material may 

J also be a* shallow grid which is laminated or otherwise 
'adhered to the solid support . ; ' ' ^ ^ " 

In addition to plastic-lDacked nitrocellulose, the 
solid support can be virtually any' porous membrane with 

15 or without a non-porous backing. Such membranes are 
readily available from numerous vendors and are made 
from nylon , PVDF, polysulfone and the like. In an 
alternative embodiment, the barrier element may also be 
used to adhere the porous membrane to a non-porous 

20 backing in addition to functioning as a barrier to 
prevent cross contamination of the assay reagents. 

In an alternative embodiment, the solid support 
can be of a non-porous material. The barrier can be 
printed either before or after the microarray of 

25 biomolecules is printed on the solid support. 

As can be appreciated, the cells formed by the 
grid lines and the underlying backing are water- 
impermeable, having side barriers projecting above the 
porous film in the cells. Thus, defined- volume samples 

30 can be placed in each well without risk of cross- 
contamination with sample material in adjacent cells. 
In Fig. 11, defined volumes samples, such as sample 
134, are shown in the cells. 

As noted above, each well contains a microarray of 

35 distinct biopolymers. In one general embodiment, the 
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microarrays in the well are identical arrays of -.v. 
distinct .biopolymers, >e+g. /different sequence 
polynucleotides. Such arrays ccan^be .formed in ; cj ^ { ( 
accordance with ^the methods: described < in, Sectional! , by 
5 depositing a first ^selected rpolynucleotideriat the same 
selected microarray position in < each ;Of the cells, .then 
depositing a second i polynucleotide at a different . , , 
microarray position in each well; ia**d so on until : ?a-, 
complete/: identical microarray ; is ; formed. in ,,each cell. 

10 - In vai preferred embodiment, reach^microarray, . : . , 
contains about 10 3 , distinct . polynucleotide or r 
polypeptide biopolymers per surf ace area of less than 
about 1 cm?.. Also in a pref erred: ^embodiment, the j : : 
biopolymers in each microarray region are present in ;a 

15 defined amount between about 0 .1 f emtomoles and ,100 

nanomolies. The ability to form high-density arrays of 
biopolymers , where each region is formed of a well - 
defined amount of deposited material, can be achieved 
in accordance with the microarray-forming method 

20 described in Section II. 

Also in a preferred embodiments, the biopolymers 
are polynucleotides having lengths of at least about 50 
bp, substantially longer, than oligonucleotides, 

which can be f ormed in highrdensity arrays, by schemes ~ 

25 involving parallel, step-wise polymer synthesis on the 
array surface.- 

In the case of a polynucleotide array, in an assay 
procedure, a small volume of the labeled DNA probe 
mixture in a standard hybridization solution is loaded 

30 onto each cell. The solution will spread to cover the 
entire microarray and stop at the barrier elements. 
The solid support is then incubated in a humid chamber 
at the appropriate temperature as required by the 
assay. 
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^ ^ Each assay may be* conducted' in -an '"iopen^facen 
f brmat where 7 nb h further sealing step is- required / rsince 
the^ hybridization solution will <be Jcept ^properly 
hydrated * by the water- vapor' 1 .in the humid' chamber. At 
5' ; the cone lus ion ; of the -incubation .step,: :fche .«ritirerisolid 
support containing the 1 numerous * microarraysi. < is <r insed 
quickly- enough to 1 dilute the assay .^reagents*: So that no 
significant cross contamination occurs— TheUentdre 
solid support >is ( then reacted with jdetection, reagents 

10 if needed ; and analyzed using :>standardccolor imetr ic , 
radioactive or fluorescent ^detection means. :» VAll » # i 
processing and detection steps are performed: < i 
simultaneously to all ; of the ; micr ©arrays on >*he solid 
support ensuring uniform -assay conditions for all of 

15 the microarrays on the solid support. 

f B. Glass-Slide Polynuc leotide • Array - 

Fig. 5 shows a substrate 136 formed according to 
another aspect of the invention, and intended for use 

20 in detecting ^binding bf labeled polynucleotides to one 
or more of a r plurallity distinct f polynucleotides. The 
substrate includes - : ia glass substoate 138 Jhaving formed 
on its surface, a coating of a polycationic polymer, 
preferably a cationic polypeptide , such as poly lysine 

25 or polyarginine. Formed on the polycationic coating is 
a microarray 140 of distinct polynucleotides , each 
localized at known selected array regions, such as 
regions 142 . 

The slide is coated by placing a uniform-thickness 
30 film of a polycationic polymer, e.g. ; poly-l-lysine, on 
the surf ace of a slide and drying the film to form a 
dried coating. The amount of polycationic polymer 
added is sufficient to form at least a monolayer of 
polymers on the glass surface. The polymer film is 
35 bound to surface via electrostatic binding between 



"Wtt95/3S505 



; PCT/tS9S/lir7<S9 



28 

hegative j silyl-OH groups^ bri^thfe surface arid charged 
amine groups r 'ih the polymers. ; Pbly-l-lysine coated 
glass 1 '"SiideS ;, 'Aay i ' ; b'e y obta 

Sigma Chemical Co* v (St. Louis, MO) i J i c > ^ ^ : * 
5 -^ l 0 <to form' -1*e l atlferbiarray > * dfef ineia ^voittmfes x ot' :y " ; ' n 
distinct polynucleotides are^ deposited 7 bri the polymer- 
coated slide, as desciribfed infection ii . : According ; to 
ah important' feature 1 5 of th6 'substrate;, 1 ^ the v deposited ° 
polynucleotides remain bound to the coated slide ' ^ ry 

10 surface non^covaiently 1 whten an"- aquetous DNA sample is 
applied to the -substrate undet conditions Which *ilow 
hybridization - A of reporter-labeled polyriiicleotides in 
the: sample to complementary-sequence (single-stranded) 
polynucleotides in the substrate array; The ^method is 

15 illustrated in Examples 1 and 2: ~ v ^ v " 

To illustrate this feature/ a substrate bf the 
type just described, but having an array of same- 
sequence polynucleotides, was mixed with fluorescent- 
labeled complementary 1 DNA tinder hybridization ' 

20 conditions. After washing to remove non-hybridized 
material, ' the substrate was examined by low-power 
fluorescence microscopy. The array can be visualized 
by the relatively uniform labeling pattern of the array 
regions. r 1 •:-■<.- 

25 In a preferred embodiment, each microarray 

contains at least 10 3 distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm 3 . In the embodiment shown in Fig. 5, the 
microarray contains 400 regions in an area of about 16 

30 mm 2 , or 2.5 x 10 3 regions/cm 2 . Also in a preferred 

embodiment, the polynucleotides in the each microarray 
region are present in a defined amount between about 
0.1 femtomoles and 100 nanomoles in the case of 
polynucleotides. As above, the ability to form high- 
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density -arrays-of this 'type, Where Vach region is 
formed bf ! a Wl^defiried ; amount *of deposited mktefrial, 
can be achieved in accordance witlv the ^ microarray- 
forming method described ; 'in Section II. 
5 Also "in -a preferred ^mbddiriehts , the P- * r 

polyhucleotiides have 1 lengths of ' at least about 50 hpr 
i^^ .y substantially' longer than oligonucleotides^ which 
can be formed in v high-density arrays by r vairlous ih situ 
synthesis schemes. ■aV-' •..■r:. " ■;■ 

V. Utility : - - 

i ^ Microarrays of inmobilized nucleic acid 'sequences 
prepared in accordance with the invention can he used 
for large scale hybridization assays ih numerous 

15 genetic applications / including genetic -and physical 

mapping of genomes, monitoring of gene expression, DNA 
sequencing;' genetic diagnosis, genotyping of organisms, 
and distribution of DNA reagents to researchers. 

For gene mapping, a gene or a cloned DNA fragment 

20 is hybridized to an ordered array of DNA fragments, and 
the identity of the DNA elements applied to the array 
is unambiguously established by the pixel or pattern of 
pixels of the array that are detected* One application 
of such arrays- for creating a genetic map is described 

25 by Nelson, et al . ;(1993) . In constructing physical 
maps of the genome, arrays of immobilized cloned DNA 
fragments are hybridized with other cloned DNA 
fragments to establish whether the cloned fragments in 
the probe mixture overlap and are therefore contiguous 

30 to the immobilized clones on the array. For example, 
Lehrach, et a!., describe such a process. 

The arrays of immobilized DNA fragments may also 
be used for genetic diagnostics. To illustrate, an 
array containing multiple forms of a mutated gene or 

35 genes can be probed with a labeled mixture of a 
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patient 4 s DNA which^wiil preferentially j interact -with 
only' one of the immobilized versions of -the gene; ^ 
" u ; The detection br this iriteractiort cahyieadcto ;a r 
medicai diagnosis . Arrays of immob i 1 i z ed l DNA 1 fragments 
5 ctah also °be used in DNA probe diagnostics; >^For ^ 

example, the identity of a pathogenic microorganism can 
be established unambiguously by hybridi2ing a sample of 
^e unkndwn pathogen ' s> DNA^ to ; an array containing -many 
types 1 of known pathogenic ' DNA'; A similar technique can 

it) also be used for janambiguous genotypihg of any< > 

organism* other molecules of genetic interest, such as 
cDNA's and RNA's can be immobilized on the array or 
alternately used as the labeled probe mixture that is 
applied to the array. . ; : w> , ♦ . 

15 in one application, an array of cDNA clones 

representing genes is hybridized with total cDNA f rom <> 
an organism to monitor gene expression f or research or 
diagnostic purposes. Labeling total cDNA from a normal 
cell with one color fluorophore and total cDNA from a 

20 diseased cell with another color fluorophore and 

simultaneously hybridizing the two cDNA samples to the 
same array of cDNA clones allows for differential gene 
expression to be measured as the ratio of the two 
fluorophore intensities. This two-color experiment can 

25 be used to monitor gene expression in different tissue 
types, disease states, response to drugs, or response 
to environmental factors. & An example of this approach 
is illustrated in Examples 2 , described with respect to 
Pig. 8. 

30 By way of example and without implying a 

limitation of scope, such a procedure could be used to 
simultaneously screen many patients against all known 
mutations in a disease gene. This invention could be 
used in the form of, for example, 96 identical 0.9 cm x 

35 2.2 cm microarrays fabricated on a single 12 cm x 18 cm 
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sheet of 1 plastic-backed nitrocellulose where) each ^ 
pmicroarray} could contain, for example; 100 DNA i i*- 
p f ragments representing all known mutations - of a given 
r gene . 1 1 1 Thfe : region < of inter est^ f rom each of s the DNA - 

US samples from 96 patients could be amplified, labeled, 
and hybridized- to the 96. individual arrays with each 
assay performed in 100 microliters of hybridization 
1 solution. The approximately l > ^thic)c : silicone rubber 
barrier- elements between individual arrays ; prevent ■"sv-u i. 

10 - cross contamination of the:, patient samples toy sealing 
the pores of the nitrocellulose and by acting as ? a^v*.- 
physical barrier between each microarray . 'The solid 
support containing all 96 microarrays assayed with the 
96 patient samples is incubated, rinsed, detected and 

15 analyzed as a single sheet of material using standard 
radioactive, fluorescent, or color imetric detection 
-means (Maniatas, et al : , 1989)v - Previously, such 
procedure would involve the handling, processing and 
tracking of 96 separate membranes in 96 separate sealed 

20 chambers . By processing all 96 arrays as a single 

sheet of material, significant time and cost savings-' 
are possible. 

The assay format can be reversed where the patient 
or organism' s DNA is immobilized as the array elements 

25 and each array is hybridized with a different mutated 
allele or genetic marker. The gridded solid support 
can also be used for parallel non-DNA ELISA assays* 
Furthermore, the invention allows for the use of all 
standard detection methods without the need to remove 

30 the shallow barrier elements t:o carry but the detection 
step. 

In addition to the genetic applications listed 
above, arrays of whole cells, peptides, enzymes, 
antibodies, antigens, receptors, ligands, 
35 phospholipids, polymers, drug cogener preparations or 
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chemical substances can be f abricated by the means 
described in this invention for large scale screening 
assays in medical diagnostics, ^drug idis«>very,uv 
molecular biology-, immunology and toxicology; 7-1, ^ 

^5 The tmult i-cell -J subistrate < aspect ;ofi the? < invention; 

allows for 'the rapid and convenient screening many 
DNA probes against many ordered arrays of DNA' • i ;u n L 
fragments. This el iminates> the need to handle and 
detect many individual arrays for. performing mass r 

10 /screenings for .genetic research and diagnostic 

applications. Numerous microarrays can be (fabricated 
on the same solid support and each microarray reacted 
with a different DNA probe while the. solid support is 
processed as a single sheet of materials k< u 

The following examples illustrate, but in no way 
are intended to limit, the present invention. 

Example 1 

20 Genomic-Complexitv Hybridization to Micro 

DNA Arrays Representing the Yeast 
SaccharomycBs cerevisiae Genome with 
Two-Cblor Fluorescent Detection 

The array elements were randomly amplified PCR ; 

25 ( Bohlander , et al., 1992 ) products using physically 
mapped lambda clones of S\ cerevisiae genomic DNA 
templates (Riles, et al., 1993). The PGR was performed 
directly on the lambda phage lysates resulting in an 
amplification of both the 35 kb lambda vector and the 

30 5-15 kb yeast insert sequences in the form of a uniform 
distribution of PCR product between 250-1500 base pairs 
in length. The PCR product was purified using 
Sephadex G50 gel filtration (Pharmacia, Piscataway, NJ) 
and concentrated by evaporation to dryness at room 

35 temperature overnight. Each of the 864 amplified 
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preparation -for spotting onto the glass . 

The micro ^arrays were fabricated'on microscope" 
slides which were coated with at layer of ~poly-l- lysine 
5 (Sigma) i^The automated apparatus described in ; Section* < 
IV loaded 1^1 j of the concentrated ( lambda clone PCR : 
product in 3 " x *SSC directly from r 96 well* isto'rage plates 
into 'the open capillary printing' element "arid deposited 
^5 nlldf sample per 6lide at 380 micron spacing between 

10 spots> on each; of 40 slides. :The process^ was ' repeated 
f or : all 864 samples and ; 8 control ^pots . 'After * the 
spotting ' operation was complete y the slide's were ' ! 
rehydrated in a humid chamber for 2 hours/ baked in a 
dry 80° vacuum oven > f or -i hours , { rinsed to "rembve un- 

15 absorbed DNA and then treated with ■ succinic anhydride 
to reduce non-specific adsorption of the labeled 
hybridization probe to the poly-l-lysine coated glass 
surface; ** Immediately prior to use, the immobilized DNA 
on the array was denatured in distilled water at 90 ° 

20 for 2 minutes: 

v : For the pooled chromosome -experiment , the 16 
chromosomes of Saccharomyces cerevisiae were separated 
in a CHEF agarose gel apparatus (Biorad, Richmond, CA) . 
The six largest chromosomes were isolated in one gel 

25 slice and the smallest 10 chromosomes in a second gel 
slice. The DNA was recovered using a gel extraction 
kit (Qiagen, Chatsworth, CA) . The two chromosome pools 
were randomly amplified in a manner similar to that 
used for the target lambda clones. Following 

30 amplification, 5 micrograms of each of the amplified 
chromosome pools were separately random-primer labeled 
using Klenow polymerase (Amersham, Arlington Heights, 
IL) with a lissamine conjugated nucleotide analog 
(Dupont NEN, Boston, MA) for the pool containing the 

35 six largest chromosomes, and with a fluorescein 
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con jugatedv. nucleotide analog (BMB) for the pool 
containing smallest ten chromosomes. The two pools 
were mixed and concentrated us ingan ultrafiltration 
device (Amicon, Dan vers, MA) • w > r 
5 / * Five juicrograms; of »,the hybridization ^probe- 

consisting of both chromosome pools in ^7. 5 fil of TE was 
denatured in a boiling water- bath and *hen ^nap cooled 
^on^icBri 2.54 °f concentrated hybridization sqlutipn 
:(S *XiSSC and-O.a* .SDS)- was added and all .10 ; m1 ^, ■ 

10 'transferred to, the array, surface,, covered with cover 
slip; placed in a custom-built single-slide humidity 
chamber and incubated at >60° foz\12 . hours,... y The slides 
were then rinsed at room temperature in 0.1 x SSC and 
0.1%SDS for 5 minutes, cover slipped and scanned. 

15 j ' A custom, built laser fluorescent scanner was. used 
to detect the two-color hybridization signals, from the 
1.8 x 1.8 cm ; array ;at 20 micron resolution. The 
scanned image was gridded and analyzed using custom 
image analysis software. After correcting for optical 

20 crosstalk between the fluorophores due to their 
overlapping emission spectra, the red and green 
hybridization values, for each clone on the array were 
-correlated to the known physical map position of the 
clone resulting in a computer-generated color karyotype 

25 of the yeast genome. 

Figure 6 shows the hybridization pattern of the 
two chromosome pools. A red signal indicates that the 
lambda clone on the array surface contains a cloned 
genomic DNA segment from one of the largest six yeast 

30 chromosomes. A green signal indicates that the lambda 
clone insert comes from one of the smallest ten yeast 
chromosomes. Orange signals indicate repetitive 
sequences which cross hybridized to both chromosome 
pools. Control spots on the array confirm that the 

35 hybridization is specific and reproducible. 
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The physical* nap locations of ^ the 1 genomic DNA ; - 
fragments contained in each of the clones used as array 
elements* have been previously s determined- by Olson and 
co-workers (Riles , v et : al.v ) .allowing for the , : automatic 
5 generation of the color karyotype! shown -in -Figure -7 . 
The color ;/of a chromosomal section on the karyotype 
corresponds to the color of « the array- element 
containing -the ( clone :from that section. The black 
regions of the karyotype represent false negative dark 
10 spots on the .array (10%) or regions* of the genome not 
covered by the Olson clone library (90%) . Note that 
the largest - six chromosomes jure mainly red while the 
smallest ten chromosomes are mainly green matching the 
original CHEF gel isolation of the hybridization probe. 
15 Areas of the red chromosomes containing green spots and 
vice-versa are probably due to spurious sample tracking 
errors in the formation of the original llbr airy and in 
the amplification and spotting procedures. 

The yeast genome arrays have also been probed with 
20 individual clones or pools of clones that are 

f luorescently labeled for physical mapping purposes . 
The hybridization signals of these clones to the array 
were translated into a position on the physical map of 
yeast. - ■ .. ■ 

25 - < . 

Example 2 

Total cDNA Hybridized to Micro Arrays of 
cDNA CJLoi)es wi,th Two-Co^pr 
Fluorescent Detection 

30 24 clones containing cDNA inserts from the plant 

Arabidopsis were amplified using PGR. Salt was added 
to the purified PCR products to a final concentration 
of 3 x SSC. The cDNA clones were spotted on poly-1- 
lysine coated microscope slides in a manner similar to 

35 Example 1. Among the cDNA clones was a clone 
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representing ^transcription factor HAT -4 > which had 
previously been used to create a transgenic line <of the 
£lant' AraMtfbpsisy - in which - this gene is present * at : ten 
times : the level ; found in wild-type Arabidopsis ^Schena , 
5 et alt, 1992); 

Total poly-A mRNA from wild type Arabidopsis was 
isolated using standard 'methods (Maniatis> et al. /- 
1989) ' and reverse 1 transcribed into 1 total cDNA; ^ using 
fluorescein nucleotide analog to label the cDNA product 

io (green fluorescence) v A similar procedureiwaso:, *^ ^ 
performed with the transgenic' line of Arabidopsis where 
the 'transcription factor HAT4 was inserted .into the i 
genome using: standard gene transfer protocols . cDNA 
copies of mRNA from the transgenic plant are .labeled 

15 with a lissamine nucleotide analog (red fluorescence). 
Two' micrograms of the cDNA products from each type of 
plant were pooled together and hybridized to the cDNA 
clone- array in a It) microliter hybridization reaction* 
in a manner similar to Example 1* Rinsing and 

20 detection of hybridization was also performed in a 

manner similar to Example 1. Fig. 8 show the resulting 
hybridization pattern of the array. 

Genes equally expressed in wild type and the 
transgenic Arabidopsis, appeared yellow due to equal 

25 contributions of the green and red fluorescence to the 
final signal. The dots are different intensities of 
yellow indicating various levels of gene expression. 
The cDNA clone representing the transcription factor 
HAT4 , expressed in the transgenic line of Arabidopsis 

30 but not detectably expressed in wild type Arabidopsis, 
appears as a red dot (with the arrow pointing to it) , 
indicating the preferential expression of the 
transcription factor in the red-labeled transgenic 
Arabidopsis and the relative lack of expression of the 
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transcription factor in .the green- labeled wild type 
Arabiddpsis^ i ^i^ ' < : . ; u sr., ; . > } :?^;< ^ 
» r An advantage uof^the microarray hybridisation ^^'t- 
format for gene expression studies -is the high partial 
5^ concentration of each cDNA species achievable in the 10 
microliter hybridization reaction. This high partial 
concentration allows or detection >of ' rare transcripts 
without the need f 6t PGR amplification >bf ^ the , ' v ' " 

hybridization probe which may bias the true genetic 

10 representation of each discrete cDNA species. 

Gene expression studies such as these can be used 
for genomics research to discover which genes are 
expressed in which : cell types ; disease states , ? 
development states or environmental conditions. Gene 

15 expression studies can also be used for diagnosis of 
disease by empirically correlating "gene expression 
patterns to disease states . : f 

20 Multiplexed Colorimetric Hybridization on 

a Gridded Solid Support 

A sheet of plastic-backed nitrocellulose was 

gridded with barrier elements made from silicone rubber 

according to the description in Section ; IVr A. The 

25 sheet was soaked in 10 x SSC and allowed to dry. As 

shown in Fig. 12, 192 M13 clones each with a different 
yeast inserts were arrayed 400 microns apart in four - 
quadrants of the solid support using the automated 
device described in Section III. The bottom left 

30 quadrant served as a negative control for hybridization 
while each of the other three quadrants was hybridized 
simultaneously with a different oligonucleotide using 
the open-face hybridization technology described in 
Section IV-A. The first two and last four elements of 
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each array are positive controls for the coiorimetric 
detection step. •*'• p* ' r -n-.. .--v r~ i.r* ^^vsvr^: '' ; 

v The oligonucleotides were labeled with fluorescein 
which was detected using an anti-f lubrescein ^antibody 
5 -conjugated to alkaline phosphatase that^precipitated an 
'NBT/BCIP dye" on the solid support (Amershzun)^ Perfect 
matches between the labeled oligos and the H13 clones 
resulted in dark spots visible to the naked eye and 
detected using an optical scanner (HP ScanJet cED) 

10 attached to a- personal computer . ^ The hybr idiz ation 
patterns are different in every quadrant indicating 
that each oligo found several unique M13 clones from 
among the 192 with a perfect sequence match. Note that 
the open capillary printing tip leaves detectable ; 

15 dimples on the nitrocellulose which can be used to 
automatically align and analyze the images. 

Although the invention has been described with 
respect to specific embodiments and methods, it will be 
20 clear that various changes and modification may be made 
without departing from the invention. 
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IT Is'OAlrfED:^'^ * ! " 1 ■ °"' ''''' 

1. A method of formxng a mxcroarray of analyte- 
assay 'regions on 'a each region in 

5 the array has* a known aiount of a selected/' anaiyte- 
Spelcif id reagent/ said methc^ comprising :> 

'('&')' loading a solution of a selected analyte- 
sp'eclfic reagent in a reiagent-dispensihg ''device having 
an elongate capillary channel (i) formed by spaced- 
10 apart, coextensive elongate members, (ii) adapted to 
hold a quantity of the reagent solution and (jiiij 
having a tip regidn at which aqueous solution iii the 
channel forms a meniscus, 

(b) tapping the tip of the dispensing device 

15 against a solid support at a defined position on the 
surface, with an impulse effective to break the 
meniscus in the capillary channel and deposit a 
selected volume of solution on the surface, and 

(c) repeating steps (a) and (b) until said array 
20 is formed. 



2. The method of claim 1, wherein said tapping is 
carried out with an impulse effective to deposit a 
selected volume in the volume range between 0.01 to 100 

25 nl. 

3. The method of claim 1, wherein said channel is 
formed by a pair of spaced-apart tapered elements. 

30 4. The method of claim l, for forming a plurality 

of such arrays, wherein step (b) is applied to a 
selected position on each of a plurality of solid 
supports at each repeat cycle proceeding step (c) . 
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5 . \ The mettfod'of ycl'aiigf V, .^which further includes , 
after performing steps (a) and: (b) at least one time, 
reloading the rea^gep^ -dispensing device with a new 
reagent solution by tip steps of (i) dipping the 
5 capillary channel of the device! in a wash solution, 
(ii) removing v wash solution drawn intoy&ha ^capillary 
channel/ and^ (iii) dipping tlie; capillary channel into 
the new reagent solution. 

10 6 . Automated apparatus for forming a microarray 

of analyte-assay regions on a plurality of solid 
supports, where each region in the array has a known 
amount of a selected, analyte-specif ic reagent, said 
apparatus comprising 

15 (a) a holder for holding, at known positions, a 

plurality/ of planar' supports , 

(b) a reagent dispensing device having ah open 
capillary channel (i) formed by spaced-apart, 
coextensive elongate members (ii) adapted to hold a 

20 quantity of the reagent solution and (iii) haying a tip 
region at which aqueous solution in the channel forms a 
meniscus, -y.\ J: \.' : 

(c) positioning means for positioning the 
dispensing device at a selected array position with 

25 respect to a support in said holder, 

(d) dispensing means for moving the device into 
tapping engagement against a support with a selected 
impulse, when the device is positioned at a defined 
array position with respect to that support, with an 

30 impulse effective to break the meniscus of liquid in 
the capillary channel and deposit a selected volume of 
solution on the surface, and 

(e) control means for controlling said positioning 
and dispensing means. 

35 
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7. : The apparatus bie claini 6, wherein said 

dispensing /means is Effective to .move said dispensing 
device against a suppdrt with, an: impulse effective to 
deposit a selected volume ih the volume range between 
o.oi to doo")nl; 1: (\ " • v ; 



; B; \ The apparatus of ■ claim 6, wherein said channel 
is formed by a pair of spaced -apart tapered elements. 

! " ; v /-'i 'ft* -1 if, V 

10 9'. The apparatus of claim 6 / wherein the control 

means operates to (i) place the dispensing device at a 
loading station , (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent , and 

15 (iii) dispense the reagent at a defined array position 
on each of the supports on said holder. 

10. The apparatus of claim 6, wherein the control 
device further operates, at the end of a dispensing 

20 cycle, to wash the dispensing device by; (i) placing the 
dispensing device at a washing station^ (ii) ^moving the 
capillary channel in the device into a ^/ash fluid, to 
load the dispensing device with the fluid, arid (iii) 
remove the wash fluid prior to load dispensing 

25 device with a fresh selected reagent. 

11. The apparatus of claim 6, wherein said device 
is one of a plurality of such devices which are carried 
on the arm for dispensing different analyte assay 

30 reagents at selected spaced array positions. 



12. A substrate with a surface having a 
microarray of at least 10 3 distinct polynucleotide or 
polypeptide biopolymers per 1 cm 3 surface area, each 
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distinct biopolymer sample (i) being disposed at a 
separate, defined position in said array, (ii) having a 
length of at least 50 subunits, and (iii) being present 
in a defined amount . between about 0.1 femtomole and 100 
nanomoles. , • .>.':'V;, j;'-. - V '--.J \ 

13 "The substrate of claim ;.:12$, wherein said 
surface is glass slide .coated witli poly lysine, and said 
biopolymers are polynucleotides. ; ; ^ 



14.. The substrate of claim 12, wherein said 
substrate has a water- impermeable backing, a water- 
permeable film formed on the backing, and a grid formed 
on the film; where said grid :(i) is composed of 
15 intersecting water-impervious grid elements extending 

from said backing to positions raised above the surface 
of said film, and (ii) partitions the film into a 
plurality of water-impervious cells, where each cell 
contains such a biopolymer array. 

20 

15. A substrate with a surface array of sample- 
receiving cells , comprising 

a water-impermeable backing, 

a water-permeable film formed on the backing, and 
25 a grid formed on the film, said grid being composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film. 

30 16. The substrate of claim 15, wherein the cells 

of the array each contain an array of biopolymers. 

17. A substrate for use in detecting binding of 
labeled biopolymers to one or more of a plurality 
35 distinct polynucleotides, comprising 
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a non-porous, glass substrate, 

a coating '.of a cat ionic -polymer on said substrate, 
and Y: 11 *■ : .: \' ■* > v; . 

an array of / distinct ipbl;ynucleot ides to said 
5 coating, where each biopolymer is disposed at a 
separate, defined positfion in a surface array of 
biopolymers. ; 1 j; I: > 

18. A method of detecting differential expression 

10 of each of a f plurality of genes in a first cell type 
with respect to expression of the sane genes in a 
second cell types, said method comprising 

producing fluorescence- labeled cDNA's from mRNA's 
isolated from the two cells types, where the cDNA's 

15 from the first and second' cells are labeled with first 
and second different fluorescent reporters, 

adding a mixture of the labeled cDNA's from the 
two cell types to an array of polynucleotides 
representing a plurality of known genes derived from 

20 the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array; and 

examining the array by fluorescence under 
fluorescence excitation conditions in which (i) 

25 polynucleotides in the array that are hybridized 

predominantly to cDNA's derived from one of the first 
and second cell types give a distinct first or second 
fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to 

30 substantially equal numbers of cDNA's derived from the 
first and second cell types give a distinct combined 
fluorescence emission color, respectively, 

wherein the relative expression of known genes in 
the two cell types can be determined by the observed 

35 fluorescence emission color of each spot. 
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19. The method of claim 18| # wherein the array of 
polynucleotides; 1b formed' jbn a substrate with a surface 
having an array j o;f jat least 10 2 ^ distinct polynucleotide 
or polypeptide bibpolymers in a surface area of less 

5 than about 1 cm 2 , -each distinct biopolymer (i) being 
disposed; at a separate, defined position in said array, 
(ii) having ;a ! length of at least 50 subunits, and (iii) 
being present in a (defined amount ^between about .1 
femtomole andlOO nmoles. 1 ' 

■o : < r j. l -i 4 .; 4 hh|- 1 ; 

20. The method of claim 19, wherein said surface 

I ! I i i : - * * * t .t < v 

as a glass slide coated with poly lysine, and said 
biopolymers jare polynucleotides iioji-covalently bound to 
said poly lysine. M ! ! ' > 
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MEASUREMENT OF GENE EXPRESSION PROFILES 
IN TOXICITY DETERMINATION 

• ■ • / ' , . . . . ; 4 ; | t , f ^ . • - , r . 

5 Field of the Invention ,;i ; 

The invention relates generally to methods for detecting and bohitoring 
phenotypic changes in in vitro and in vivo systems for assessing arid/br detenriihing 
the toxicity of chemical compounds, arid more particularly, the invention relates to a 
method for detecting and' monitoring changes in gene expression patterns' in in vitro 
10 and in vivo systems for determining the toxicity of drug cahdi&tes/ ' 

; BACKGROUND ' ' ' : ' ! " 

The ability to rapidly and conveniently aissess the toxicity of new compounds 
is extremely important. Thousands 6f new compounds are synthesized eveiy year, 

1 5 and many are introduced to the environment through the development of new 

commercial products and processes, often with little knowledge of their short term 
and long term health effects. In the development of new drugs, the cost of assessing 
the safety and efficacy of candidate compounds is becoming astronomical: It is 
.estimated that the pharmaceutical industry spends an average of about 300 million 

20 dollars to bring a new pharmaceutical compound to market, e.g. Biotechnology. 13: 
226-228 (1995). A large fraction of these costs are due to the failure of candidate " 
compounds in the later stages of the developmental process. That is. as the 
assessment of a candidate drug progresses from the identification of a compound as a 
drug candidate-for example, through relatively inexpensive binding assays or in vitro 
25 screening assays, to pharmacokinetic studies, to toxicity studies, to efficacy studies in 
model systems, to preliminary clinical studies, and so on, the costs of the associated 
tests and analyses increases tremendously. Consequently, it may cost several tens of 
millions of dollars to determine that a once promising candidate compound possesses 
a side effect or cross reactivity that renders it commercially infeasible to develop 

0 further. A great challenge of pharmaceutical development is to remove from further 
consideration as early as possible those compounds that are likely to fail in the later 
stages of drug testing. 

Drug development programs are clearly structured with this objective in mind; 
however, rapidly escalating costs have created a need to develop even more stringent 

5 and less expensive screens in the early stages to identify false leads as soon as 

possible. Toxicity assessment is an area where such improvements may be made, for 
both drug development and for assessing the environmental, health, and safety effects 
of new compounds in general. 
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Typically the toxicity of a compound is determined by administering the 
compound to one or more spepies of test animal under ..controlled conditions and by 
monitoring the effects on a wide range of parameters. The parameters include such 
things as blood chemistry,, weight gain or loss, a variety of behavioral .patterns, muscle 
tone, body temperature, respiration rate, lethality, and the like, which collectively . 
provide a measure of the state of health of the test animal; The degree of deviation of 
such parameters from their, normal ranges gives a measure. of the toxicity, of a 
compound. Such tests may be designed to assess the acute, prolonged, or chronic 
toxicity of a compound. . In general, acute tests involve administration of the test 
chemical on one occasion. The period of observation of the test animals may be as 
short as a few hours, although it is usually at least 24 hours andin some cases it. may 
be as long as a week or more. In general, prolonged tests involve administration of 
the test chemical on multiple occasions. The test chemical may be administered one 
or more times each day, irregularly as when it is incorporated in the diet, at specific 
times such as during pregnancy, or in some cases regularly but only at weekly 
intervals. Also, in the prolonged test the experiment is usually conducted for not less 
than 90 days in the rat or mouse or a year in the dog. In contrast to the acute and 
prolonged types of test, the chronic toxicity tests are those in which the test chemical 
. is administered for a substantial portion of the lifetime of the test animal. In the case 
of the mouse or rat, this is a period of 2 to 3 years. In the case of the dog, it is for 5 to 
7 years. 

Significant costs are incurred in establishing and maintaining large cohorts of 
test animals for such assays, especially the larger animals in chronic toxicity assays. 
Moreover, because of species specific effects, passing such toxicity tests does not 
ensure that a compound is free of toxic effects when used in humans. Such tests do, 
however, provide a standardized set of information for judging the safety of new 
compounds, and they provide a database for giving preliminary assessments of related 
compounds. An important area for improving toxicity determination would be the 
identification of new observables which are predictive of the outcome of the 
expensive and tedious animal assays. 

In other medical fields, there has been significant interest in applying recent 
advances in biotechnology, particularly in DNA sequencing, to the identification and 
study of differentially expressed genes in healthy and diseased organisms, e.g. Adams 
et al. Science, 252: 1651-1656 (1991); Matsubara et al, Gene. 135: 265-274 (1993); 
Rosenberg et al. International patent application, PCT/US95/01863. The objectives 
of such applications include increasing our knowledge of disease processes, 
identifying genes that play important roles in the disease process, and providing 
diagnostic and therapeutic approaches that exploit the expressed genes or their 
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products. While siichapproaches are attractive, those based on exhaustive,, or even 
sampled, sequencing of expressed genes are still beset by'theenbrmoiis effort 
required: It is estimated that 30-35 thousand different genes are expressed in a typical 
mammalian tissue in any givehstate, e.g.- Ausubel et al; Editors^ Current Protocols, 
'5:8:1-5.8.4 (John / Wiley & Sons, New Yofk; 1992). Determining the sequences of 
even a smallsample of that number of gene products is a major enterprise,: requiring 
industrial-scale resources. Thus, the routine application of massive sequencing of 
expressed genes is still beyond cturent commercial technology: * ? * 

The availability of new assays for assessing the toxicity of compounds, such 
as candidate drugs; that would provide more comprehensive and precise information 
about the state of health of a test animal would be highly desirable. Such additional 
assays would preferably be less expensive, more rapid, and more convenient than 
current testing procedures, and would at the same time provide enough information to 
make early judgments regarding the safety of new cbmpounds. > ; 

Summary of the Invention 
An object of the invention is to provide a new approach to toxicity assessment 
based on an examination of gene expression patterns; or profiles, in in vitro or in vivo 
.test systems. f - ^ ^ 

Another object of the invention is to provide a database on which to base 
decisions concerning the toxicological properties of chemicals, particularly drug 
candidates. 

A further object of the invention is to provide a method for analyzing gene 
expression patterns in selected tissues of test animals. - 

A still further object of the invention is to provide a system for identifying 
genes which are differentially expressed in response to exposure to a test compound. 

Another object of the invention is to provide a rapid and reliable method for 
correlating gene expression with short term and long term toxicity in test animals. 

Another object of the invention is to identify genes whose expression is 
predictive of deleterious toxicity. 

The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel DNA sorting and sequencing methodologies that 
permit the formation of gene expression profiles for selected tissues by determining 
the sequence of portions of many thousands of different polynucleotides in parallel. 
Such profiles may be compared with those from tissues of control organisms at single 
or multiple time points to identify expression patterns predictive of toxicity. 
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The sorting methodology of the invention makes use.ofoligonucleotide tags 
that are members of a minimally cross-hybridizing set of oligonucleotides. : The 
sequences of oligonucleotides of such a set differ frpm the sequences, of every other 
member of the same set by at least two nucleotides. Thus, each member of such a set 
cannot form a duplex (or triplex) with the complement of any other member with less 
than two mismatches. Complements of oligonucleotide tags of the invention, referred 
to herein as "iag complements,;; may comprise natural nucleotides or non-natural I, 
nucleotide analogs. Preferably, tag complements are attached, to. solid, phase, supports. 
Such oligonucleotide tags when used with their corresponding. tag complements 
provide a means of enhancing specificity of hybridization for sorting polynucleotides, 
suchascDNAs. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, , 
such that different polynucleotides have different tags. As explained mpre fully 
below, this condition is achieved by employing a repertoire of tags substantially 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides. 
After such sampling, when the populations of supports, and polynucleotides are mixed 
under conditions which permit specific hybridization of the oligonucleotide tags wi th 
-their respective complements, identical polynucleotides sort onto particular beads or 
regions. The sorted populations of polynucleotides can then be sequenced on the 
solid phase support by a "single-base" or "base-by-base" sequencing methodology, as 
described more fully below. 

In one aspect, the method of the invention comprises the following steps: (a) 
administering the compound to a test organism; (b) extracting a population of mRN A 
molecules from each of one or more tissues of the test organism; (c) forming a 
separate population of cDNA molecules from each population of mRNA molecules 
extracted from the one or more tissues such that each cDNA molecule of the separate 
populations has an oligonucleotide tag attached, the oligonucleotide tags being 
selected from the same minimally cross-hybridizing set; (d) separately sampling each 
population of cDNA molecules such that substantially all different cDNA molecules 
within a separate population have different oligonucleotide tags attached; (e) sorting 
the cDNA molecules of each separate population by specifically hybridizing the 
oligonucleotide tags with their respective complements, the respective complements 
being attached as uniform populations of substantially identical complements in 
spatially discrete regions on one or more solid phase supports; (0 determining the 
nucleotide sequence of a portion of each of the sorted cDNA molecules of each 
separate population to form a frequency distribution of expressed genes for each of 
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the one or more tissues; and (g) correlating the frequency distribution of expressed 
genes in each of the one or more tissues with the toxicity of the compound. 

^important aspect of the invention is the identification of genes whose 
expression is preidictive of thetoxicity of a compound. Oncesuch genes are v 
jdentified,*hey may be employed in conventional assays, such as reverse transcriptase 
polymerase chain reaction (RT-PCR) assays for gene expression. ' 

Brief Description of the Drawings . ^ 
Figure 1 is a flow chart representation of an algorithm for generating 

minimally cross-hybridizing sets of oligonucleotides. 

Figure 2 diagrammatically illustrates an apparatus for carrying out 

polynucleotide sequencing in accordance with the invention. 

Definitions 

"Complement" or "tag complement" as used herein in reference to 
oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag 
specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments 
where specific hybridization results in a triplex, the oligonucleotide tag may be 
.selected to be either double stranded or single stranded; thus, where triplexes are 
formed, the term "complement" is meant to encompass either a double stranded 
complement of a single stranded oligonucleotide tag or a single stranded complement 
of a double stranded oligonucleotide tag! 

The term "oligonucleotide" as used herein includes linear oligomers of natural 
or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, 
anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of 
specifically binding to a target polynucleotide by way of a regular pattern of 
monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base 
stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually 
monomers are linked by phosphodiester bonds or analogs thereof to form 
oligonucleotides ranging in size from a few monomelic units, e.g. 3-4, to several tens 
of monomelic units. Whenever an oligonucleotide is represented by a sequence of 
letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3' 
order from left to right and that "A" denotes deoxyadenosine, "C H denotes 
deoxycytidine, "G" denotes deoxyguanosine, and "T denotes thymidine, unless 
otherwise noted. Analogs of phosphodiester linkages include phosphorothioate, 
phosphorodithioate, phosphoranilidate, phosphoramidate, and the like. Usually 
oligonucleotides of the invention comprise the four natural nucleotides; however, they 
may also comprise non-natural nucleotide analogs. It is clear to those skilled in the 
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art when oligonucleotides having natural or non-natural nucleotides may be 
employed, e.g. where.processing by enzymes is called for, usually oligonucleotides 
consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or 
oligonucleotide strands making up the duplex form a double stranded structure with 
one other such that every nucleotide in each strand undergoes. Watson-Crick 
basepairing with a nucleotide in the other strand. The.term also comprehends the 
pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine 
bases, and the like, that may be employed. In reference to a triplex, the term means, .. 
that the triplex consists of a perfectly matched duplex and a third^strand in which 
every nucleotide undergoes Hoogsteen or reverse, Hopgsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex 
between a tag and an. oligonucleotide means that a pair or triplet of nucleotides in the 
duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen.and/or reverse 
Hoogsteen bonding. . 

As used herein, "nucleoside" includes the natural nucleosides, including 2'- 
deoxy and 2'-hydroxyl forms, e.g. as described in Kornberg and Baker, DNA 
Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to . 
. nucleosides includes synthetic nucleosides having modified base moieties and/or 
modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley, 
New York, 1980); Uhlman and Peyman, Chemical Reviews, 90: 543-584 (1990). or 
the like, with the only proviso that they are capable of specific hybridization. Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
complexity, increase specificity, and the like. 

As used herein "sequence determination" or "determining a nucleotide 
sequence" in reference to polynucleotides includes determination of partial as well as 
full sequence information of the polynucleotide. That is, the term includes sequence 
comparisons, fingerprinting, and like levels of information about a target 
polynucleotide, as well as the express identification and ordering of nucleosides, 
usually each nucleoside, in a target polynucleotide. The term also includes the 
determination of the identification, ordering, and locations of one, two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some 
embodiments sequence determination may be effected by identifying the ordering and 
locations of a single type of nucleotide, e.g. cytosines. within the target polynucleotide 
"CATCGC ..." so that its sequence is represented as a binary code, e.g. " 1 00101 ... " for 
"C-(not C)-(not C)-C-(not C)-C ... " and the like. 
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As used herein, the term "complexity" in reference to a population of 
polynucleotides mearis.the number of different species of molecule present in the 
population.' 1 /: ' u '* • ' r * v ; * 

As used' herein, the terms "g£rie expression profile," and "gene expression > 
5 pattern" which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. 
Generally, the portions of sequence are sufficiently long to uniquely identify the 
cDNA frbm which the portion arose. Preferably^ the total number of sequences . 
determined is at least 1000; more preferably^ the total number of sequences 
10 determined in a gene expression profile is at least fen thousand. ; : 

As used herein, "test organism" means any in 'Vitro- or in vivo system which 
provides measureable responses to exposure to test compounds: Typically, test 
organisms may be mammalian cell cultures, particularly of specific tissues, such as 
hepatocytes, neurons, kidney cells, colony forming cells, or the like, or test organisms 
1 5 may be whole animals, such as rats, mice, hamsters, guinea pigs, dogs, cats, rabbits, 
pigs, monkeys, and the like. . , . , 

Detailed Description of the Invention 
The invention provides a method for determining the toxicity of a compound 

20 by analyzing changes in the gene expression profiles in selected tissues of test 
organisms exposed to the compound. The invention also provides a method of 
identifying toxicity markers consisting of individual genes or a group of genes that is 
expressed acutely and which is correlated with prolonged or chronic toxicity, or 
suggests that the compound will have an undesirable cross reactivity. Gene 

25 expression profiles are generated by sequencing portions of cDNA molecules 
construction from mRNA extracted from tissues of test organisms exposed to the 
compound being tested. As used herein, the term "tissue" is employed with its usual 
medical or biological meaning, except that in reference to an in vitro test system, such 
as a cell culture, it simply means a sample from the culture. Gene expression profiles 

30 derived from test organisms are compared to gene expression profiles derived from 
control organisms to determine the genes which are differentially expressed in the test 
organism because of exposure to the compound being tested. In both cases, the 
sequence information of the gene expression profiles is obtained by massively parallel 
signature sequencing of cDNAs, which is implemented in steps (c) through (f) of the 

35 above method. 

Toxicity Assessment 
Procedures for designing and conducting toxicity tests in in vitro and in vivo 
systems is well known, and is described in many texts on the subject, such as Loomis 
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et al. Loomis's Esstentials of Toxicology, 4th Ed. (Academic Press, New York. 1996); 
Echobichon, The Basics of Toxicity Testing (CRC Press, Boca Raton, 1992); Frazier, 
editor. In Vitro Toxicity Testing (Marcel Dekker, Ne t w York, 1 992); and the like. ; 

. i5 .In .toxicity, testing, two groups qf test organisms are usually employed: one 
group serves as axpntrol and the otbergroup receives the test compound in a single 
dose (for acute toxicity, tests) or a.regimen ofdoses (for prolonged or chronic toxicity 
tests). Since in most cases, the extraction of tissue as called for. in the method of the 
invention requires sacrificing the testanimal.both the control group and the group . , 
receiving compound must-be large enough to, permit .removal of animals.for sampling 
tissues, if it : is desired to observe the dynamics/rf gene expression through the 
duration of an experiment.. 

In setting, up a toxicity study, extensive guidance is provided in)the; literature 
for selecting, the appropriate test organism for the compound being tested, route of 
administration, dose ranges, and the like. Water, or physiological saline,(0.9% NaCI 
in water) is the.solute of choice forthe .test compound sincethese sol vents permit 
administration by a. variety of routes. When this is not possible because of solubility 
limitations, it is necessary to resort to the use of vegetable oils such as corn oil or 
even organic solvents, of which propylene glycol is commonly used. Whenever 
-possible the use of suspension of emulsion should be avoided except for oral 
administration. Regardless of the route of administration,, the volume required to 
administer a given dose is limited by the size of the animal that is used. It is desirable 
to keep the volume of each dose uniform within and between groups of animals. 
When rates or mice are used the volume administered by the oral route should not 
exceed 0.005 ml per gram of animal. Even when aqueous or physiological saline 
solutions are used for parenteral injection the volumes that are tolerated are limited, 
although such solutions are ordinarily thought of as being innocuous. The 
intravenous LD 50 of distilled water in the mouse is approximately 0.044 ml per gram 
and that of isotonic saline is 0.068 ml per gram of mouse. 

When a compound is to be administered by inhalation, special techniques for 
generating test atmospheres are necessary. Dose estimation becomes very 
complicated. The methods usually involve aerosolization or nebulization of fluids 
containing the compound. If the agent to be tested is a fluid that has an appreciable 
vapor pressure, it may be administered by passing air through the solution under 
controlled temperature conditions. Under these conditions, dose is estimated from the 
volume of air inhaled per unit time, the temperature of the solution, and the vapor 
pressure of the agent involved. Gases are metered from reservoirs. When particles of 
a solution are to be administered, unless the particle size is less than about 2 urn the 
particles will not reach the terminal alveolar sacs in the lungs. A variety of 
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apparatuses and chambers are available to perform studies for "detecting effects of 
irritant or other toxic endpoihts' when they are administered by inhalation. The' 
preferred method of administering ah agent to animals is via the oral route, either by. 
intubation or by incorporating the agent in the feed^ " ;*-.*;:'■. 

5 Preferably, in designing a toxicity assessment, two or riiore species should be 

employed that handle the test compound as similarly to man as possible in terms of 
metabolism, absolution, excretion, tissue storage, and tbe like: Preferably, multiple - 
doses or regimens at different concentrations should be employed to establish a dose- 
response relationship with respect to toxic effects. And preferably, 1 the route of : 1 

1 0 administration to the test animal should be the same as, or as similar as possible to, 
the route of administration of the compound to man. Effects obtained by one route of 
administration to test animals are not a priori applicable to effects by another route of 
administration to man. For example, food additives for man should be tested by- 
admixture of the material in the diet of the test :ariimdls. * 

1 5 Acute toxicity tests consist of administering a compound to test organisms on 

one occasion. The purpose of such test is to determine the symptomotology 
consequent to administration of the compound and to determine the degree of lethality 
of the compound. The initial procedure is to perform a series of range-finding doses 
jof the compound in a single species. This necessitates selection of a route of 

20 administration, preparation of the compound in a form suitable for administration by 
the selected route, and selection of an appropriate species. Preferably, initial acute 
toxicity studies are performed on either rats or mice because of their low cost, their 
availability, and the availability of abundant toxicologic reference data on these 
species. Prolonged toxicity tests consist of administering a compound to test 

25 organisms repeatedly, usually on a daily basis, over a period of 3 to 4 months. Two 
practical factors are encountered that place constraints on the design of such tests: 
First, the available routes of administration are limited because the route selected 
must be suitable for repeated administration without inducing harmful effects. And 
second, blood,- urine, and perhaps other samples, should be taken repeatedly without 

30 inducing significant harm to the test animals. Preferably, in the method of the 
invention the gene expression profiles are obtained in conjunction with the 
measurement of the traditional toxicologic parameters, such as listed in the table 
below: 
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Hematoloj; 
erythrocyte count 
total leukocyte count 
differential leukocyte 
count 
hematocrit 
hemoglobin 



Blood C hemisti 
sodium 
potassium ! 
chloride 

calcium 

carbon dioxide 

serum glutamine-pyruvate 

transaminase 

serum glutamin-oxalacetic 
transaminase 
serum protein 
^electrophoresis 
blood sugar 
blood urea nitrogen 
total serum protein 
serum albumin 
total serum bilirubin 



Urine A nalyses 

pH 

specific gravity 
total protein 

sediment 
glucose 
ketones 

bilirubin 



Oligonucleotide Tags and T a p Comr jlement. ' 
Oligonucleotide tags are members of a minimally cross-hybridizin^ set of 
oligonucleotides. The sequences of oligonucleotides of such a set differ from the 
sequences of every other member of the same set by at least two nucleotides Thus 
each member of such a set cannot form a duplex (or triplex) with the complement o'f 
any other memberwith less thantwo mismatches. Complements of oligonucleotide 
tags, referred to herein as complements," may comprise natural nucleotides or 
non-natural nucleotide analogs; Preferably, tag complements are attached to solid 
phase supports. Such oligonucleotide tags when used with their corresponding tag 
complements provide a means of enhancing specificity of hybridization for sorting, 
tracking, or labeling molecules, especially polynucleotides. 

Minimally cross-hybridizing sets of oligonucleotide tags and tag complements 
may be synthesized either combinatorial^ or individually depending on the size of the 
set desired and the degree to which cross-hybridization is sought to be minimized (or 
stated another way, the degree to which specificity is sought to be enhanced). For 
example, a minimally cross-hybridizing set may consist of a set of individually 
synthesized 10-mer sequences that differ from each other bv at least 4 nucleotides 
such set having a maximum size of 332 (when composed of 3 kinds of nucleotides 
and counted using a computer program such as disclosed in Appendix Ic) 
Alternatively, a minimally cross-hybridizing set of oligonucleotide tags may also be 
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assembled combinatorially from subunits which themselves are selected from a 
minimally cross-hybridizing set. For example, a set of minimally cross-hybridizing 
12-mers differing from one another by at least three nucleotides may be synthesized 
by assembling 3 subunits selected from a set of minimally cross-hybridizing 4-mers 
5 that each differ from one another by three nucleotides. Such an embodiment gives a 
maximally sized set of 9 3 , or 729, 12-mers. The number 9 is number of 
oligonucleotides listed by the computer program of Appendix la, which assumes, as 
with the 10-mers, that only 3 of the 4 different types of nucleotides are used. The set 
is described as "maximal" because the computer programs of Appendices Ia-c provide 
0 the largest set for a given input (e.g. length, composition, difference in number of 
nucleotides between members). Additional minimally cross-hybridizing sets may be 
formed from subsets of such calculated sets. 

Oligonucleotide tags may be single stranded and be designed for specific' 
hybridization to single stranded tag complements by duplex formation or for specific 
5 hybridization to double stranded tag complements by triplex formation. 

Oligonucleotide tags may also be double stranded and be designed for specific 
hybridization to single stranded tag complements by triplex formation. 

When synthesized combinatorially, an oligonucleotide tag preferably consists 
.of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length wherein each subunit is selected from the same minimally cross- 
hybridizing set. In such embodiments, the number of oligonucleotide tags available 
depends on the number of subunits per tag and on the length of the subunits. The 
number is generally much less than the number of all possible sequences the length of 
the tag, which for a tag n nucleotides long would be 4 n . 

Complements of oligonucleotide tags attached to a solid phase support are 
. used to sort polynucleotides from a mixture of polynucleotides each containing a tag. 
Complements of the oligonucleotide tags are synthesized on the surface of a solid 
phase support, such as a microscopic bead or a specific location on an array of 
synthesis locations on a single support, such that populations of identical sequences 
are produced in specific regions. That is, the surface of each support, in the case of a 
bead, or of each region, in the case of an array, is derivatized by only one type of 
complement which has a particular sequence. The population of such beads or regions 
contains a repertoire of complements with distinct sequences. As used herein in 
reference to oligonucleotide tags and tag complements, the term "repertoire" means 
the set of minimally cross-hybridizing set of oligonucleotides that make up the tags in 
a particular embodiment or the corresponding set of tag complements. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
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below, this condition is achieved by employing a repertoire of tags substantially 
greater than the population of polynucleotides andby taking.a sufficiently small t ' 
sample of tagged polynucleotides from the fullensemble of .tagged polynucleotides. , 
After such sampling, whenjhe populations of supports and polynucleotides are mixed 
under conditions ..which permit specific, hybridization of the oligonucleotide tags with 
their respective complements, identical polynucleotides sort onto particular beadsor 
regions. 

The nucleotide sequences of oligonucleotides of a minimally cross-hybridizing 
set are conveniently enumerated by simple computer programs, such as those 
exemplified by programs whose.source codes are listed? in Appendices la and lb. 
Program minhx of Appendix la computes all minimally cross-hybridizing sets having 
4-mer subunits composed of three kinds of nucleotides. Program. tagN of Appendix 
lb enumerates longer oligonucleotides of a minimally cross n hybridizing set, Similar 
algorithms and computer programs are readily written for listing oligonucleotides ;of 
1 5 minimally cross-hybridizing sets for any embodiment of the invention. Table I below 
provides guidance as to me,size of sets of minimallyicross^hybridizing , . .... 
oligonucleotides for the indicated lengths and number of nucleotide differences. The 
above computer programs were used to generate the numbers. 



10 



20 



• • • . ■■ ■> ..... 

Tablel 

Nucleotide 
.Difference 
between Maximal Size 



Oligonucleotides of Minimally ' Size of 

Oligonucleotid of Minimally Cross, RcpenoTe Size of 

w ^ rH uu^*"c Hybridizing with Four Repertoire wiih 

S, Hybnd mngSet Set Words Five Words 

4 3 9 6561 590x I0 4 

6 3 27 5.3x l0 5 |.43x I0 7 

7 4- 27 5.3 x 10 s 1.43 x I0 7 

7 5 * 4096 3.28xl0 4 

8 3 "90 1.30 xlO 9 2.48 x lO 11 
8 4 « 1.48 xlO 7 9.16 x10 s 

8 5 '8 1.05 x10 s 1.89x)0 6 

9 - 5 39 2.31 xlO 6 9 02xlO 7 

10 5 332 1.21 xlO 10 

10 6 28 6.15xlO S 1.72 x I0 7 

11 5 |87 
18 6 =25000 
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For some embodiments of the invention, where extremely large repertoires of 
tags are not required, oligonucleotide tags of a minimally cross-hybridizing set may 
be separately synthesized. Sets containing several hundred to several thousands, or 
even several tens of thousands, of oligonucleotides may be synthesized directly by a 
variety of parallel synthesis approaches,. e.g. as disclosed in -Frank, et.al, U.S. patent 
4,689,405; Frank et al, Nucleic Acids Research, 1 1: 4365-4377 (1983); Matson et al, 
Anal. Biochem., 224; U0-M6 (1995); Fpdor„et al,, International application 
PCT/US93/04145; Peaseet al,J>roc.. Natl. Acad. . Sci,, 9,1: 5022-5026 (1994); 
iSouthern et.al, J. Biotechnology, 35: 2.1 7-227 (1994), Brennan, International 
application PCT/US94/05896; Lashkari et al, Proc. Natl. Acad. Sci., 92: 7912-7915 
(1995); or the like. 

Preferably, oligonucleotide tags of the invention are synthesized 
combinatorial^ out of subunits between three and six nucleotides in leqgth and 
selected from the same minimally cross-hybridizing set. For oligonucletides in this 
range, the members of such sets may be enumerated by computer programs based on 
the algorithm of Fig. 1. 

The algorithm of Fig. 1 is implemented by first defining the characteristics of 
the subunits of the minimally cross-hybridizing set, i.e T length, number of base 
differences between members, and composition, e.g. do they consist of two, three, or 
four kinds of bases. A table M n , n=l, is generated (100) that consists of all possible 
sequences of a given length and composition. An initial, subunit S\ is selected and 
compared (120) with -successive subunits Sj for i=n+l to the end of the table.. 
Whenever a successive subunit has, the required number of mismatches to be a 
member of the minimally cross-hybridizing set, it. is saved in a new table M n +j (125), 
that also contains subunits previously selected in prior passes through step 120. For 
example, in the first set of comparisons, M2 will contain S j ; in the second set of 
comparisons, M3 will contain S\ and S2; in the third set of comparisons, M4 will 
contain Sj, S2, and S3; and so on. Similarly, comparisons in table Mj will be 
between Sj and all successive subunits in Mj. Note that each successive table M n +] 
is smaller than its predecessors as subunits are eliminated in successive passes 
through step 130. After every subunit of table M n has been compared (140) the old 
table is replaced by the new table M n+1 , and the next round of comparisons are 
begun. The process stops (1 60) when a table M n is reached that contains no 
successive subunits to compare to the selected subunit Sj, i.e. M n =M n+ i . 

Preferably, minimally cross-hybridizing sets comprise subunits that make 
approximately equivalent contributions to duplex stability as every other subunit in 
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20 



the set. In this way, the stability of perfectly matched duplexes/between every .subunit 
and its complement is approximately equal. ( .Guidance fpr selecting such sets is 
provided by published techniques, for . selecting ppdmal.PCR primers and calculating 
duplex stabilities, e.g. RycWi^tal. Nucleic Acids Res^cJ^ 17: : 8543.,8551 (1,989) . 

(1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991);and the like. 
For shorter tags, e.g. about 30 nucleotides or less, r the algorithm described by Rychlik 
and Wetmur is preferred, and forJonger tags, : e.g. about 3Q-35 nucleotides or greater, 
an algorithm disclosed by Suggs etal,.pages 683-693,in Brown,, editor, ICN-UCLA 
Symp. Dev. Biol., Vol. ,23 (Academic. Press, New York, 1981) may be. conveniently 
employed. Clearly. the are many approaches available to. one skilled in the art for 
designing sets of minimally cross^hybridizing subunits within the scope of the 
invention. For example, to minimize the affects of different base-stacking energies of 
terminal nucleotides when subunits are. assembled, subunits may be provided that 
have the same terminal nucleotides. In this way, when subunits are linked, the sum of 
the base-stacking energies of all the adjoining, terminal nucleotides will be.the same, 
thereby reducing or eliminating variability.in tag melting temperatures. 

A "word" of terminal nucleotides, shown in italic below. may also be added to 
. each end of a tag so that a perfect match is always formed between it and a similar 
terminal "word" on any other tag complement. Such an augmented tag would have 
the form: 



25 



30 



35 



w 


w, 


w 2 ... w k ., 




w 


w 


wr 


w 2 - ... w t .,< 




w 



where the primed W's indicate complements. With ends of tags always forming 
perfectly matched duplexes, all mismatched words will be internal mismatches 
thereby reducing the stability of tag-complement duplexes that otherwise would have 
mismatched words at their ends. It is well known that duplexes with internal 
mismatches are significantly less stable than duplexes with the same mismatch at a 
terminus. 

A preferred embodiment of minimally cross-hybridizing sets are those whose 
subunits are made up of three of the four natural nucleotides. As will be discussed 
more fully below, the absence of one type of nucleotide in the oligonucleotide tags 
permits target polynucleotides to be loaded onto solid phase supports by use of the 
5'->3' exonuciease activity of a DNA polymerase. The following is an exemplary 
minimally cross-hybridizing set of subunits each comprising four nucleotides selected 
from the group consisting of A, G, and T: 



-14- 



WO 97/13877 



PGT/US96/16342 



5 Table II 

Word: wj w 2 w 3 w 4 

Sequence: GATT TGAT TAGA TTTG 

ATGT AAAG 

In this set, each member would form a duplex having three mismatched bases with 
10 the complement of every othermember. J 

Further exemplary minimally cross-hybridizing sets are listed below in Table 
III. Clearly, additional sets can be generated by substituting different groups of 
"nucleotides, or by using subsets of known minimally cross-hybridizing sets. 

15 

Table III 

Exemplary Minimally Cross-Hvbridizing Sets of 4-mer Subunits 



Set 1 


Set 2 


Set 3 


Set 4 


Set 5 


Set 6 


CATT 


ACCC 


AAAC 


AAAG 


AACA 


AACG 


CTAA 


AGGG 


ACCA 


ACCA 


ACAC 


ACAA 


TCAT 


CACG 


AGGG 


AGGC 


AGGG 


AGGC 


ACTA 


CCGA 


CACG 


CACC 


CAAG 


CAAC 


TACA 


CGAC 


■ CCGC 


CCGG 


CCGC 


CCGG 


TTTC 


GAGC 


CGAA 


CGAA 


CGCA 


CGCA 


ATCT 


GCAG 


GAGA 


GAGA 


GAGA 


GAGA 


AAAC 


GGCA 


GCAG 


GCAC 


GCCG 


GCCC 




AAAA 


GGCC 


GGCG . 


GGAC 


GGAG 



Word: w 5 w 6 ; 

Sequence: GTAA AGTA 
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Set I 
AAGA . 
ACAC 
AGCG ' 
CAAG • 
CCCA 
CGGC 
GACC 
GGGG. : 
GGAA 



Set 8 


oc l y 


bet 10 


Set 11 


AAGC , 


LAPP 


.ALAG; : 


.-. ' j ACCG.,;, 


ACAA 




. AALA , 


AAAA. 


AGCG 




AbCaC 


• AGGC 


.\CAAG • . 


GAAC- v. 




■ •■■•'LALG '■ 


CCCC . 


CCCG 


CCGA 


:CCGA . 


CGGA 


CGGA 


CGCG 


CGAG 


1 GAGA • 


'GAGA 


• GAGG 


: 'GAGG - 


GCGG . . 


„GCGC 


GCCC . 


.1- ,.GCAC 


GGAC 


GGAG 


GGAA 


GGCA 



Set 12 
•-ACGA 

AAAC 
' AGCG ' 
■ GAGA ■: • 

CCAG 

CGGC 
: GAGG . ! 
-GCCC 

GGAA 



The oligonucleotide tags of the invention and their complements are v 
conveniently synthesized on an automated DNA synthesizer, e.g. an Applied 
BiosystemSi Inc. (Foster City •California) model 392 or 394 DNA/RNA Synthesizer! 
using standard chemistries, such as phosphoramidite chemistry, e.g. disclosed in the 
following references: Beaiicage and Iyer, Tetrahedron, 48: 2223-231 T( 1992); Molko 
et al, U.S. patent 4,980,460; Koster et al, U.S. patent 4,725,677; Caruthers et al, UiS. 
patents 4,415,732; 4,458,066; and 4^973,679; and the like. Alternative chemistries, 
e.g. resulting in non-natural backbone groups. such as phosphorothioate, 
phosphoramidate, and the like, may also be employed provided that the resulting 
.oligonucleotides are capable of specific hybridization. In some embodiments, tags 
may comprise naturally occurring nucleotides that permit processing or manipulation 
by enzymes, while the corresponding tag complements may comprise non-natural 
nucleotide analogs, such as peptide nucleic acids, or like compounds, that promote the 
formation of more stable duplexes during sorting. 

When microparticles are used as supports, repertoires of oligonucleotide tags 
and tag complements may be generated by subunit-wise synthesis via "split and mix- 
techniques, e.g. as disclosed in Shortle et al. International patent application 
PCT/US93/03418 or Lyttle et al, Biotechniques, 19: 274-280 (1995). Briefly, the 
basic unit of the synthesis is a subunit of the oligonucleotide tag. Preferably, 
phosphoramidite chemistry is used and 3' phosphoramidite oligonucleotides are 
prepared for each subunit in a minimally cross-hybridizing set, e.g. for the set first 
listed above, there would be eight 4-mer 3 , -phosphoramidites. Synthesis proceeds as 
disclosed by Shortle et al or in direct analogy with the techniques employed to 
generate diverse oligonucleotide libraries using nucleosidic monomers, e.g. as 
disclosed in Telenius et al, Genomics, 13: 718-725 (1992); Welsh et al. Nucleic Acids 
Research, 19: 5275-5279(1991); Grothues et al, Nucleic Acids Research, 21: 1321- 
1322 (1993); Hartley, European patent application 90304496.4; Lam et al, Nature. 
354: 82-84 (1991); Zuckerman et al, Int. J. Pept. Protein Research, 40: 498-507 
(1992); and the like. Generally, these techniques simply call for the application of 
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^mixtures of the activated monomers to the growing oligonucleotide during the 
coupling steps. Preferably'; oligonucleotide tags and tag complements are synthesized 
oh a DNA synthesizer having a number of synthesis chambers which is greater than or 
equal'to the number of different kinds of words used in the construction of the tags. 

5 That is, preferably tliere is a synthesis chamber corresponding to each type of word. 
In this embodiment, words are added nucleotide-by-riucleotide; such that if a word 
consists of five nucleotides there are fiVe monomer Couplings in each synthesis ♦ 
chamber. After a word is completely'syntheisized, the synthesis supports are removed 
from the chambers, mixed, and redistributed back to the chambers for the next cycle 

0 of word addition.^ This latter embodiment takes advantage of the high coupling yields 
of monomer addition, e.g. in phosphoramidite chemistries. * L » ' : 

Double stranded forms of tags may be made by separately synthesizing the 
complementary strands followed by mixing under conditions that permit duplex 
formation. Alternatively, double stranded tags may be formed by first synthesizing a 

5 * single stranded repertoire linked to a known oligonucleotide sequence that serves as a 
primer binding site. The second strand is then synthesized by combining the single 
stranded repertoire with a primer and extending with a polymerase. This latter 
approach is described in Oliphant et al, Gene, 44: 177-183 (1986). Such duplex tags 
- may then be inserted into cloning vectors along with target polynucleotides for sorting 

0 and manipulation of the target polynucleotide in accordance with the invention. 

When tag complements are employed that are made up of nucleotides that 
have enhanced binding characteristics, such as PNAs or oligonucleotide N3'— >P5' 
phosphoramidates, sorting can be implemented through the formation of D-loops 
between tags comprising natural nucleotides and their PNA or phosphoramidate 

5 complements, as an alternative to the "stripping" reaction employing the 3'— »5" 
exonuclease activity of a DNA polymerase to render a tag single stranded. 

Oligonucleotide tags of the invention may range in length from 12 to 60 
nucleotides or basepairs. Preferably, oligonucleotide tags range in length from 18 to 
40 nucleotides or basepairs. More preferably, oligonucleotide tags range in length 

0 from 25 to 40 nucleotides or basepairs. In terms of preferred and more preferred 
numbers of subunits, these ranges may be expressed as follows: 

Table IV 

Numbers of Subunits in Tags in Preferred Embodiments 

5 

Monomers 

in Subunit Nucleotides in Oligonucleotide Tag 

(12-60) (18-40) (25-40) 
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3 4-20 subunits 6-13 subunits 8-13 subunits 

4 3M5 subunits 4ilOsubunits • ' 6^10 subunits 1 
5 . ... 2 - 1 2 subunits . . 3-8 subunits . . 5-8 subunits, 

••v.- . «••< ... "2X10 subunits-" ' 3-6 subunits ' - 4-6 subunits 

Most preferably, oligonucleotide tags are single stranded and specific hybridization 
occurs via Watson-Crick pairing with a tag complement. 

Preferably, repertoires of single stranded oligonucleotide tags of the invention 
5 i contain at least 1 00 members; more preferably, repertoires of such tags contain at 
least 1000 members; and most preferably; repertbires of such tags contain at least 
10,000 members. : 'i :> ^nyj. 

" ,<: --" l? ' ; ' Triplex Taps i -i 

""' m embodiments where specific hybridization occurs via triplex formation, ' 

10 coding of tag sequences follows the same principles as for duplex-forming tags; : '' 
however, there are further constraints on the selection of subunit sequences. 
Generally; third strand association via Hbogsteen type of binding is most stable along 
homopyrimidine^homopurine tracks in a double 'stranded target. Usually; base triplets ; 
form in T-A*T or C-G*C motifs (Where "-" indicates Watson-Crick pairing and 

5 indicates Hoogsteen type of binding); however, other motifsare also possible. For 
example. Hoogsteen base pairing permits parallel and antiparallel orientations 
between the third strand (the Hoogsteen strand) and the purine-rich strand of the 
duplex to which the third strand binds, depending on conditions and the composition 
of the strands. There is extensive guidance in the literature for selecting appropriate 
sequences, orientation, conditions, nucleoside type (e.g. whether ribose or 
deoxyribose nucleosides are employed), base modifications (e.g. methylated cytosine. 
and the like) in order to maximize, or otherwise regulate, triplex stability as desired in 
particular embodiments, e.g. Roberts et al, Proc. Natl. Acad. Sci., 88: 9397-9401 
(1991); Roberts et al, Science, 258: 1463-1466 (1992); Roberts et al, Proc. Natl. 
Acad. Sci., 93: 4320-4325 (1996); Distefano et al, Proc. Natl. Acad. Sci , 90: 1 179- 
1 183 (1993); Mergny et al, Biochemistry, 30: 9791-9798 (1991); Cheng et al, J. Am. 
Chem. Soc, 1 14: 4465-4474 (1992); Beal and Dervan, Nucleic Acids Research, 20: 
2773-2776 (1992); Beal and Dervan, J. Am. Chem. Soc, 1 14: 4976-4982 (1992); 
Giovannangeli et al, Proc. Natl. Acad. Sci., 89: 8631-8635 (1992); Moser and Dervan, 
Science, 238: 645-650 (1987); McShan et al, J. Biol. Chem., 267:5712-5721 (1992); 
Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al, Nucleic Acids 
Research, 20: 1777-1784 (1992); Thuong and Helene, Angew. Chem. Int. Ed. Engl. 
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32: 666-690 (1993); Escude etal, Proc Natl. Acad. Sci.; 93: 4365-4369 (1996); and 
the like. Conditions for annealing single-stranded or duplex tags to their single- 
stranded or duplex complements are well known, e.g. Ji et al s AriaL Chem. 65: i 1323- 
1328 (1993); Cantor et al/U.S/^ has 
5 the advantage of hot requiring a "stripping" reaction witfrpolymerase' to expose the 
tag for annealing to 'its complement. ' v " *' : - 

Preferably, oligonucleotide tags of the invention employing triplex 
hybridization are double stranded DMA and the corresponding tag complements are 
single stranded. More preferably, 5-methylcytosine is used in place of cytosine in the 

10 tag complements in ! order td broaden the range of pH stability of the triplex formed 
between a tag and its complement. Preferred conditions for forming triplexes are - 
ftilly disclosed in the above references. Briefly, hybridization takes place in ^ * 
concentrated salt solution, e.g. 1.0 M NaCl, 1 .0 M potassium acetate, or the like ; at^ 
pH below 5^5 ( or 6.5 if 5-methylcytbsihe is employed). Hybridization temperature 

1 5 depends on the length and composition of the tag; however, for an 1 8-20ririer tag of 
longer hybridization at room temperature is adequate. Washes may be conducted 
with less concentrated salt solutions, e.g. 1 0 mM sodium acetate, 100 mM MgCl 2 , pH 
5.8;at room temperature. Tags may be eluted from their tag complements by 
- incubation in a similar salt solution at pH 9.0. N 

20 Minimally cross-hybridizing sets of oligonucleotide tags that form triplexes 

may be generated by the computer program of Appendix Ic, or similar programs. An 
exemplary set of double stranded 8-mer words are listed below in capital letters with 
the corresponding complements in small letters. Each such word differs from each of 
the other words in the set by three base pairs. 

25 

Table V 

Exemplary Minimally Cross-Hybridizing 
Set of DoubleStranded 8-mer Tags 





-AAGGAGAG 


5' 


-AAAGGGGA 


' 5' 


-AGAGAAGA 


5' 


-AGGGGGGG 


3' 


-TTCCTCTC 


3' 


-TTTCCCCT 


3' 


-TCTCTTCT 


3' 


-TCCCCCCC 


3' 


-ttcctctc 


3' 


-tttcccct 


3' 


-tctcttct 


3' 


-tccccccc 




-AAAAAAAA 


5' 


-AAGAGAGA 


t * 

•j 


-AGGAAAAG 


5' 


-GAAAGGAG 


3' 




3' 


-TTCTCTCT 


3' 


-TCCTTTTC 


3' 


-CTTTCCTC 


3' 


-tttttttt 


3' 


-ttctctct 


3' 


-tccttttc 


3' 


-ctrtcctc 


C t 


-AAAAAGGG 


5' 


-AGAAGAGG 


5' 


-AGGAAGGA 


5' 


-GAAGAAGG 


3' 


-TTTTTCCC 


3' 


-TCTTCTCC 


3' 


-TCCTTCCT 


3' 


-CTTCTTCC 


3' 


-tttttccc 


3' 


-tcttctcc 


V 


-tcct tcct 


3' 


-cttcttcc 


c * 
-J 


-AAAGGAAG 


5' 


-AGAAGGAA 


5' 


-AGGGGAAA 


5' 


-GAAGAGAA 


y 


-TTTCCTTC 


3' 


-TCTTCCTT 


3' 


-TCCCCTTT 


3' 


-CTTCTCTT 


3' 


-rrtccttc 


3' 


-tcttcctt 


3' 


-tccccttt 


3' 


-cttctctt 
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; - , ; - . : TableVI; r , 

Repertoire Size of Various Double Stranded Tags 
That Form Triplex^ with Their tap Complement 

i - ;: " "i- ' Nucleotide:- ■ f. i • .n: :]■•:• :■■.■<•;.■ 
Difference 

'■• ■ ! '" between >' , Maximal Size" : • "' "' 

Oligonucleotides - of Minimally .. Size of 

Ohgonucleotid > of Minimally ' ' C ro'ss- • Repertoire ' Size of 

Word ' ; . Cross-. ... Hybridizing. with Four Repertoire with 

Word Hybr,d.z,ngSe, Set Words Five Words 



4 


2 


v.- 8 ' 


4096 


3.2 xlO 4 


6 


3 


8 


1 4096. 


3.2 xlO 4 


8 


3 


16 


6.5 x 10 4 


1.05 x 10 6 


10 


5 




4096 




15 


5 


' 92 


f • * i . i - ■ 




20 


6 


765 ( 






20 


8 


92 






20 


10 


22 







Preferably, repertoires of double stranded oligonucleotide tags of the invention 
contain at least 10 members; more preferably, repertoires of such tags contain at least 
1 00 members. Preferably, words are between 4 and 8 nucleotides in length for 
combinatorial^ synthesized double stranded oligonucletide tags, and oligonucleotide 
tags are between 1 2 and 60 base pairs in length. More preferably, such tags are 
between 1 8 and 40 base pairs in length. 



Solid Phase Supports 
Solid phase supports for use with the invention may have a wide variety of 
forms, including microparticles, beads, and membranes, slides, plates, micromachined 
chips, and the like. Likewise, solid phase supports of the invention may comprise a 
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wide variety of compositions, including glass, plastic, silicon,' alkanethiolate- 
derivatized gold, cellulose, low cross,linked and Wgh'cro%l^ed.poiystyrene. silica 
gel, polyamide. and the like. , Preferably, either a population of discrete particles are 
employed such that each has a uniform coating, or population^ of complementary 
sequences of the same tag (and no otlier) v or a single or a few supports are employed 
with spatially discrete regions each containing ; a uniform coating, or population, of 
complementary sequences to me same tag (and no other). In the latter embodiment, 
the area of the regions may vary according to particular applications; usually,, the 
regions range in area from several um2, e.g. .3,5, to several hundred um2, e.gJOO- 
500. Preferably, such regions are spatially discrete so that signals generated by 
events, e.g. fluorescent emissions, at adjacent regions can be resolved by the detection 
system being employed. In some applications, it may be desirable to have regions 
with uniform coatings of more than one tag complement, e.g. for simultaneous 
sequence analysis, or for bringing separately tagged molecules into close proximity. 

Tag complements may be used with the solid phase support that they are 
synthesized on, or they may be separately synthesized and attached to a solid phase 
support for use, e.g. as disclosed by Lund et al, Nucleic Acids Research, 16: 10861- 
10880 (1988); Albretsen et al, Anal. Biochem., 189: 40-50 (1990); Wolf et al, Nucleic 
- Acids Research, 15: 291 1-2926 (1987); or Ghosh et al, Nucleic Acids Research, 15: 
20 5353-53 72 ( 1 987). Preferably, tag complements are synthesized on and used with the 
same solid phase support, which may comprise a variety of forms and include a 
variety of linking moieties. Such supports may comprise microparticles or arrays or 
matrices, of regions where uniform populations of tag complements are synthesized. 
A wide variety of microparticle supports may be used with the invention, including 
microparticles made of controlled pore glass (CPG), highly cross-linked polystyrene, 
acrylic copolymers, cellulose, nylon, dextran, latex,, pojyacrolein, and the like, 
disclosed in the following exemplary references. Meth. Enzymol., Section A, pages 
1 1-147, vol. 44 (Academic Press, New York, 1976); U.S. patents 4.678.814; 
4,413,070; and 4,046;720; and Pon, Chapter 1 9, in Agrawal, editor. Methods in 
Molecular Biology, Vol. 20, (Humana Press, Totowa, NJ, 1993). Microparticle 
supports further include commercially available nucleoside-derivatized CPG and 
polystyrene beads (e.g. available from Applied Biosystems, Foster City. CA); 
derivatized magnetic beads; polystyrene grafted with polyethylene glycol (e.g.. 
TentaGel™, Rapp Polyrnei . e> Tubingen Germany); and the like. Selection of the 
support characteristics, such as material, porosity, size, shape, and the like, and the 
type of linking moiety employed depends on the conditions under which the tags are 
used. For example, in applications involving successive processing with enzymes, 
supports and linkers that minimize steric hindrance of the enzymes and that facilitate 
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access to substrate are preferred. Other important factors to be considered in selecting 
the most appropriate microparticle support include size uniformity, efficiency as a 
synthesis support, degree to which surface area known, and optical properties, e.g. as 
explain more fully below, clpar sinooth beads provide instrumentational advantages ; 
whep handling large numbers of beads on a surface. 

Exemplary linking moieties for attaching: and/or synthesizing tags on 
microparticle surfaces are disclosed .in.Ponetal, Biotechniques, 6:768-775 (1988); 
Webb, U.S. patent 4,659,774; Barany et al^ International patent application 
PCT/US91/06103; Brown et al, J. Chem. Soc. Commun., 1989: 891r893; Damha et 
al. Nucleic Acids Research, 1 8: 3813r3821; ; (1990); Beattie et al, Clinical Chemistry, 
39:71 9-722 ( 1 993 ); Maskos and Southern, Nucleic Acids Research, 20: -1679-1 684 
(1992); and the like, 

As mentioned above, tag complements may also, be synthesized on a single = 
(or a few) solid phase support to form an, array, of regions uniformly coated with tag 
complements. That is, within each region in such an array the same tag complement 
is synthesized. Techniques for synthesizing such arrays are disclosed in McGall et al, 
International application PCT/US93/03767; Pease et al, Proc. Natl. Acad. Sci., 91: 
5022-5026 (1994); Southern and Maskos, International application 
-PCT/GB89/01 1 14; Maskos and Southern (cited above); Southern et al, Genomics, 13: 
1008-1017 (1992); and Maskos and Southern, Nucleic Acids Research, 21 : 4663- 
4669(1993). 

Preferably, the invention is implemented with microparticles or beads 
uniformly coated with complements of the same tag sequence. Microparticle supports 
and methods of covalently or noncovalently linking oligonucleotides to their surfaces 
are well known, as exemplified by the following references: Beaucage and Iyer (cited 
above); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, 
Oxford, 1984); and the references cited above. Generally, the size and shape of a 
microparticle is not critical; however, microparticles in the size range of a few, e.g. 1- 
2, to several hundred, e.g. 200-1000 diameter are preferable, as they facilitate the 
construction and manipulation of large repertoires of oligonucleotide tags with 
minimal reagent and sample usage. 

In some preferred applications, commercially available controlled-pore glass 
(CPG) or polystyrene supports are employed as solid phase supports in the invention. 
Such supports come available with base-labile linkers and initial nucleosides attached, 
e.g. Applied Biosystems (Foster City, CA). Preferably, microparticles having pore 
size between 500 and 1000 angstroms are employed. 

In other preferred applications, non-porous microparticles are employed for 
their optical properties, which may be advantageously used when tracking large 
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numbers of microparticles on planar supports, such as a microscope slide. - 
Particularly preferred non-porous microparticles are the glycidal methacrylate (GMA) 
beads available from Bangs Laboratories {Canriel, IN). Such micropairticles are - 1 ■ 
useful in a variety of sizes and derivatized with a variety of linkage groups for 
5 synthesizing tags or tag complements. Preferably; for massively parallel , : 
manipulations of tagged microparticles, 5 jim diameter GMA beads are employed. 



Attaching Tags to Polynucleotides 
For Sorting onto Solid Phase Supports 1 
An important aspect of the invention is the sorting and attachmentof a 
populations of polynucleotides, e.g. from a cDNA library, to microparticles or to 

1 5 separate regions on a solid phase support such that each microparticle or region has 
substantially only one kind of polynucleotide attached. This objective is 
accomplished by insuring that substantially all different polynucleotides have 
different tags attached. This condition, in turn, is brought about by taking a sample of 
- the full ensemble of tag-polynucleotide conjugates for analysis. (It is acceptable that 

20 identical polynucleotides have different tags, as it merely results in the same 

polynucleotide being operated on or analyzed twice in two different locations:) Such 
sampling can be carried out either overtly-for example, by taking a small volume 
from a larger mixture-after the tags have been attached to the polynucleotides, it can 
be carried out inherently as a secondary effect of the techniques used to process the 

25 polynucleotides and tags, or sampling can be carried out both overtly and as an 
inherent part of processing steps. 

Preferably, in constructing a cDNA library where substantially all different 
cDNAs have different tags, a tag repertoire is employed whose complexity, or number 
of distinct tags, greatly exceeds the total number of mRNAs extracted from a cell or 

30 tissue sample." Preferably, the complexity of the tag repertoire is at least 10 times that 
of the polynucleotide population; and more preferably, the complexity of the tag 
repertoire is at least 100 times that of the polynucleotide population; Below, a 
protocol is disclosed for cDNA library construction using a primer mixture that 
contains a full repertoire of exemplary 9-word tags. Such a mixture of tag-containing 

35 primers has a complexity of 8 9 , or about 1 .34 x 1 0 8 . As indicated by Winslow et al, 
Nucleic Acids Research, 19: 3251-3253 (1991), mRNA for library construction can 
be extracted from as few as 10-100 mammalian cells. Since a single mammalian cell 
contains about 5 x 10 5 copies of mRNA molecules of about 3.4 x 10 4 different kinds. 
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by standard techniques one can isolate the mRNA from about. 100 cells, or . ... . . 

(theoretically) about 5 x 10 7 mRNA molecules.. Comparing this .number to the ; , 
complexity of the primer mixture shows that without any additional steps, and even 
assuming that mRNAs are converted into .cD^As^wim.perfect efficiency (1% 
efficiency or less is more accurate), the cDNA library construction protocol results in 
a population containing no more than 37% of the total number of different tags* That 
is, without any overt sampling step at. all, the protocol inherently generates, a sample 
that comprises 37%, or less, of the tag repertoire. ^probability of obtaining a 
double under these, conditions is about 5%, which, is within the preferred.range. With 
mRNA from 10 cells, the fraction of the tag repertoire sampled is reduced to only 
3.7%, even assuming that all the processing steps take place, at 100% efficiency. In 
fact, the efficiencies of the processing steps for constructing cpNA libraries are very 
low, a "rule of thumb" being that gopd .library should contain about 10 8 cDNA clones 
from mRNA extracted from 10 6 mammalian cells. 

Use of larger amounts of mRNA in the above protocol, or for larger amounts 
of polynucleotides in general, where the number of such molecules exceeds the 
complexity of the tag repertoire, a tag-polynucleotide conjugate mixture potentially 
contains every possible pairing of tags and types of mRNA or polynucleotide. In such 
cases, overt sampling may be implemented by removing a sample volume after a 
serial dilution of the starting mixture of tag-polynucleotide conjugates. The amount 
of dilution required depends on the amount of starting material and the efficiencies of 
the processing steps, which are readily estimated. 

If mRNA were extracted from 10 6 cells (which would correspond to about 0.5 
ug of poly(A)' RNA), and if primers were present in about 10-100 fold concentration 
excess-as is called for in a typical protocol, e.g. Sambrook et al, Molecular Cloning, 
Second Edition, page 8.61 [10 uL 1.8 kb mRNA at 1 mg/mL equals about 1.68 x 10'" 
moles and 1 0 jxL 1 8-mer primer at 1 mg/mL equals about 1 .68 x 1 0' 9 moles], then the 
total number of tag-polynucleotide conjugates in a cDNA library would simply be 
equal to or less than the starting number of mRNAs, or about 5 x 1 0 1 1 vectors 
containing tag-polynucleotide conjugates-again this assumes that each step in cDNA 
construction-first strand synthesis, second strand synthesis, ligation into a vector- 
occurs with perfect efficiency, which is a very conservative estimate. The actual 
number is significantly less. 

If a sample of n tag-polynucleotide conjugates are randomly drawn from a 
reaction mixture~as could be effected by taking a sample volume, the probability of 

drawing conjugates having the same tag is described by the Poisson distribution 
-A. ' 
P(r)=e' (A.)7r, where r is the number of conjugates having the same tag and X=np, 

where p is the probability of a given tag being selected. If n=l 0 6 and p=l/(l .34 x 



-24- 



WO 97/13877 



PCT/US96/I6342 



10 8 ), then X=.00746 and P(2)=2.76 x 10* 5 . Thus, a sample of one million molecules 
gives rise to an expected number of doubles well within the preferred range. Such a 
sample is readily obtained as follows: Assume.that the 5 x 10 11 mRNAs are!perff ct\y 
converted into 5 x 10 M vectors with tag-cDNA conjugates as inserts and that the 5 x 
5 JO 1 ! vectors are in a reaction solution having a volume : of 100 \xl. Four lOrfold serial 
dilutions may be carried out by transferring 1 0 fjtl from the original solution into a 
vessel containing 90 fai of an appropriate buffer, such as TE. This process may be 
repeated for three additional dilutions to obtain a 100 solution containing 5 x 10 5 
vector molecules per pi. A 2 t |il aliquot from this solution yields 10 6 vectors ! 
1 0 containing tag-cDNA conjugates as inserts. This sample is then amplified by straight 
forward transformation of a competent host cell followed by culturing. 

Of course, as mentioned above, no step in the above process proceeds with 
perfect efficiency. In particular, when vectors are employed to amplify a sample of 
tag-polynucleotide conjugates, the step of transforming a host is very inefficient. 
1 5 Usually, no more than 1 % of the vectors are taken up by the host and replicated. 
Thus, for such a method of amplification, eVen fewer dilutions would be required to 
obtain a sample of 10 6 conjugates! "•■ 

A repertoire of oligonucleotide tags can be conjugated to a population of 
- polynucleotides in a number of ways, including direct enzymatic- ligation,' 
20 amplification, e.g. via PCR, using primers containing the tag sequences, and the like. 
The initial ligating step produces a very large population of tag-polynucleotide 
conjugates such that a single tag is generally attached to many different 
polynucleotides. However, as noted above, by taking a sufficiently small sample of 
the conjugates, the probability of obtaining "doubles," i.e. the same tag on two 
25 different polynucleotides, can be made negligible. Generally, the larger the sample 
the greater the probability of obtaining a double. Thus, a design trade-off exists 
between selecting a large sample of tag-polynucleotide conjugates- which, for 
example, ensures adequate coverage of a target polynucleotide in a shotgun 
sequencing operation or adequate representation of a rapidly changing mRNA pool, 
30 and selecting a small sample which ensures that a minimal number of doubles will be 
present. In most embodiments, the presence of doubles merely adds an additional 
source of noise or, in the case of sequencing, a minor complication in scanning and 
signal processing, as microparticles giving multiple fluorescent signals can simply be 
ignored. 

35 As used herein, the term "substantially all" in reference to attaching tags to 

molecules, especially polynucleotides, is meant to reflect the statistical nature of the 
sampling procedure employed to obtain a population of tag-molecule conjugates 
essentially free of doubles. The meaning of substantially all in terms of actual 
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percentages of tag-molecule conjugates depends on how the tags are being employed. 
Preferably, for nucleic acid sequencing, substantially all means that at least eighty 
percent of the polynucleotides have unique tags attached. More preferably, it means 
that at least ninety percent of the polynucleotides have unique tags attached. . Still 
5 more preferably, it means that at least ninety : fi ve percent of the polynucleotides have 
unique tags attached. And, most preferably, it means that, at least ninety-nine percent 
of me polynucleotides have unique tags attached. 

Preferably, when the population of polynucleotides consists of messenger 
RNA (mRNA), oligonucleotides tags may be attached by reverse transcribing the 
1 0 mRNA with a set of primers preferably containing complements of tag sequences. 
An exemplary set of such primers could have the following sequence (SEQ ID NO: 

i): '.. . . ; \ V ., 1 .., \ ' I 

/ . 5 ' -mRNA- [ A ] n - 3 ' 
15 ..... I T ] 1 9 GG [ W , W , W , ,C ] qAC CAGCTG ATC- 5 ' -bi o t i n 

where "[W,W,W,C]9" represents the sequence of an oligonucleotide tag of nine 
- suburiits of four nucleotides each and "[W,W,W,C]" represents the subunit sequences 
20 listed above, i.e. " W" represents T or A. The underlined sequences identify an 

optional restriction endonuclease site that can be used to release the polynucleotide 
from attachment to a solid phase.support via the biotin, if one is employed. For the 
above primer, the complement attached to a microparticle could have the form: 

25 . 5'-[G,W,W,W] 9 TGG-linker-microparticle 

After reverse transcription, the mRNA is removed, e.g. by RNase H digestion, 
and the second strand of the cDNA is synthesized using, for example, a primer of the 
following form (SEQ ID NO: 2): 

30 

5 ' -NRRGATCYNNN-3 ' 

where N is any one of A, T, G, or C; R is a purine-containing nucleotide, and Y is a 
pyrimidine-containing nucleotide. This particular primer creates a Bst Yl restriction 
35 site in the resulting double stranded DNA which, together with the Sal I site, 

facilitates cloning into a vector with, for example, Bam HI and Xho 1 sites. After Bst 
Yl and Sal I digestion, the exemplary conjugate would have the form: 
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5'-RCGACCA[C,W,W,W]9GG[T]ig- cDNA -NNNR 

GGT [G, W, W, W] ^GC [A] i'g- rDNA -NNNYCTAG-5 1 

The polynucleotide-tag conjugates may then be manipulated using standard molecular 
5 biology techniques. For example, the above conjugate— which is actually a mixture- 
may be inserted into commercially available cloning vectors, e.g. Stratagene Cloning 
System (La Jolla, CA); transfected into a host, such as a commercially available host 
bacteria; which is then cultured to increase the number of cohjiijgates. The cloning 
vectors may then be isolated using standard techniques, e.g. Sambrook et al, 
10 Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 
1 989). Alternatively, appropriate adaptors and primers may be employed so that the 
conjugate population can be increased by PCR!. 

Preferably, when the ligase-based method of sequencing is employed, tHe'Bst 
Yl and Sal I digested fragments are cloned into a Bam HI-/Xho 'Indigested vector 
1 5 having the following single-copy restriction sites (SEQ ID NO: 3): 

5 ' -GA GGATG CCTTTAT GGAT CC A CTCGAG ATCCCAATCCA-3 ' 
- . Fokl BamHI Xhol ' : " l 

20 

This adds the Fok I site which will allow initiation of the sequencing process 
discussed more ftilly below. 

Tags can be conjugated to cDNAs of existing libraries by standard cloning 
methods. cDNAs are excised from their existing vector, isolated, and then ligated into 

25 a vector containing a repertoire of tags. Preferably, the tag-containing vector is 

linearized by cleaving with two restriction enzymes so that the excised cDNAs can be 
ligated in a predetermined orientation. The concentration of the linearized tag- 
containing vector is in substantial excess over that of the cDN A inserts so that 
ligation provides an inherent sampling of tags. 

30 A general method for exposing the single stranded tag after amplification 

involves digesting a target polynucleotide-containing conjugate with the 5'->3 r 
exonuclease activity of T4 DNA polymerase, or a like enzyme. When used in the 
presence of a single deoxynucleoside triphosphate, such a polymerase will cleave 
nucleotides from 3' recessed ends present on the non-template strand of a double 

35 stranded fragment until a complement of the single deoxynucleoside triphosphate is 
reached on the template strand. When such a nucleotide is reached the 5'->3' 
digestion effectively ceases, as the polymerase's extension activity adds nucleotides at 
a higher rate than the excision activity removes nucleotides. Consequently, single 
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stranded tags constructed with three nucleotides are readily prepared for loading onto 
solid phase supports. . ....... 

The technique may also be used to,preferentially ; methylate interior Foklsites 
of a target polynucleotide while leaving, a single Fok 1 siteat the terminus.of the Cj , 
, 5 Polynucleotide unmethylated. First, the, terminal Fok I site is rendered single stranded 
using a polymerase with deoxycyudine triphosphate., .The double stranded portion of 
the fragment is then methylated, after which the single stranded terminus is filled in 
with a DNA polymerase in the presence of all four nucleoside triphosphates, thereby 
regenerating the Fok 1 site. Clearly, this procedure, can be generalized to 
0 endonucleases other than Fok I. 

After the oligonucleotide tags are prepared for specific hybridization, e.g. by 
rendering them single stranded as described above, the polynucleotides are mixed, 
with microparticles containing the complementary sequences of the tags under 
conditions mat favor the formation of perfectly matched duplexes between the tags 
5 and their complements. There is extensive guidance in the literature for creating these 
conditions. Exemplary references providing.such guidance include Wetmur, Critical 
Reviews in Biochemistry and Molecular Biology, 26: 227 r 259. (1991); Sambrook et 
al, Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor , 
- Laboratory, New York, 1989); and the like. Preferably, the hybridization conditions 
are sufficiently stringent so that only .perfectly matched ; sequences form stable 
duplexes. Under such conditions the polynucleotides specifically hybridized through 
their tags may be ligated to the complementary sequences attached to, the 
microparticles. Finally, the microparticles are washed to remove polynucleotides with 
unligated and/or mismatched tags. 

When CPG microparticles conventionally employed as synthesis supports are 
used, the density of tag complements on the microparticle surface is typically greater 
than that necessary for some sequencing operations. That is, in sequencing 
approaches that require successive treatment of the attached polynucleotides with a 
variety of enzymes, densely spaced polynucleotides may tend to inhibit access of the 
relatively bulky enzymes to the polynucleotides. In such cases, the polynucleotides 
are preferably mixed with the microparticles so that tag complements are present in 
significant excess, e.g. from 10:1 to 100:1, or greater, oyer the polynucleotides. This 
ensures that the density of polynucleotides on the microparticle surface will not be so 
high as to inhibit enzyme access. Preferably, the average inter-polynucleotide spacing 
on the microparticle surface is on the order of 30-100 nm. Guidance in selecting 
ratios for standard CPG supports and Ballotini beads (a type of solid glass support) is 
found in Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992). 
Preferably, for sequencing applications, standard CPG beads of diameter in the range 
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of 20-50 nm are loaded with about 10^ polynucleotides, arid GM A beads of diameter 
in the range of 5-10 jam are loaded with a few tens of thousand of pblyriudeotides, 
e.g. 4x l6 4 to6x'l0 4 ' — ' ■ 71 ' ; i !; '~ ' " ' 

In the preferred embodiment, tag complements are synthesized on 
5 microparticles combinatorial^; thus, at the end of the synthesis, one obtains a 

complex mixture of microparticles from which a sample is taken for loading tagged 
polynucleotides. The size of the sample of microparticles will depend on several 
factors, including the size of the repertoire of tag complements, the nature of the 
apparatus for used for observing loaded microparticles— e.g. its capacity, the tolerance 
1 0 for multiple copies of microparticles with the same tag complement (i.e. "bead 
doubles"), and the like. The following table provide guidance regarding 
microparticle sample size, microparticle diameter, and the approximate physical 
dimensions of a packed array of microparticles of various diameters. 

15 . . . ; 

Microparticle diameter 
Max. no. 

polynucleotides loaded 
at 1 per 10 5 sq. 
angstrom 

Approx. area of 
monolayer of 10 6 
microparticles 

20 The probability that the sample of microparticles contains a given tag complement or- 
is present in multiple copies is described by the Poisson distribution, as indicated in 
the following table. 

25 

Table Vll 



5 um lOfim 20 |im 40 nm 



3 x !0 5 1.26 x 10 6 5 x 10?. 



.45 x .45 cm 1 x I cm 2 x 2 cm 4x4 cm 
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^vNumber of , - . : 
microparticles in . . 
sample (as fraction ,r ' 
* of repertoire size), ; . 
m 


Fraction of 
i. J : : repertoire of tag; . 
complements 
present in 1 
(Sample, . 
l-e' m 


• ■ ' ' . ; ■ - 1- . • •»_ 

r rat nun ui 

if " -mirrnnartinlpc in. ; ' . 

* iiiiui vipariiciOi in*. ■ -• 

saninle with iintmip 

. *«ull|J*v Willi U1UW|UC 
•Ag WUf llJJldJICIll 

auawnCQ; 
■ iTl^C fix. 


1.000 . 


0.63 


f i f 0.37 - 


693 


. . . 0.50 . . 


, 0.35 ........ 


'" " ' .405' " 


0.33 


0.27 


'Kv- ,285. ■ 




0.21 


223 ... 


0.20 


- 0.18. 


.105 


0.10 


0.09 


vj> - oio / : ... 


0.01* .> • v 


0.01 ' v ; 



Fraction of 
micropanicles in 
; sample carrying 

same tag 
' complement as one 
other microparticle 

in sample 
("bead doubles"), 
m 2 (e' m )/2- 



0.18 
0.12 
0.05 
0.03 
0.02 
0.005 



High Spe cificity Sorting and Panning 
5 The kinetics of sorting depends on the rate of hybridization of oligonucleotide 

tags to their tag complements which, in .turn, depends on the complexity of the tags in 
-.the hybridization reaction. Thus, a trade off exists between sorting rate and tag 
complexity, such that an increase in sorting rate may be achieved at the cost of 
reducing the complexity of the tags involved in the hybridization reaction^ As 

1 0 explained below, the effects of this trade off may be ameliorated by "panning." 
^ Specificity of the hybridizations may be increased by taking a sufficiently 
small sample so that both a high percentage of tags in the sample are unique and the 
nearest neighbors of substantially all the tags in a sample differ by at least two words. 
This latter condition may be met by taking a sample that contains a number of tag- 

1 5 polynucleotide conjugates that is about 0. 1 percent or less of the size of the repertoire 
being employed. For example, if tags are constructed with eight words selected from 
Table II, a repertoire of 8 8 , or about 1 .67 x 1 0 7 , tags and tag complements are 
produced. In a library of tag-cDNA conjugates as described above, a 0. 1 percent 
sample means that about 16,700 different tags are present. If this were loaded directly 

20 onto a repertoire-equivalent of microparticles, or in this example a sample of 1 .67 x 
10 7 microparticles, then only a sparse subset of the sampled microparticles would be 
loaded. The density of loaded microparticles can be increase--for example, for more 
efficient sequencing--by undertaking a "panning" step in which the sampled tag- 
cDNA conjugates are used to separate loaded microparticles from unloaded 

25 microparticles. Thus, in the example above, even though a M 0. 1 percent" sample 
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contains only 16,700 cDNAs, the sampling and panning steps may be repeated until 
as many loaded micropartides as desired are accumulated. ' i: ' 

A panning step may be implemented by providing a sample of tag-cDN A 
conjugates each of which contains a capture moiety ait an end opposite, or distal to, 
5 the oligonucleotide tag. ' Preferably, the' capture moiety is of a type which can be : f 
released from the tag-cDN A conjugates, so that the taig-cDNA conjugates can be 
sequenced with a single-base sequencing method. Such moieties may comprise 1 
biotin, digbxigenin^ or like ligands, a triplex binding region, or the like. Preferably, 
such a capture moiety comprises a biotin component. Biotin may be attached to tag- 
10 cDNA conjugates by a number of standard techhiquesV If appropriate adapters 

containing PCR primer binding sites are attached to tag-cDNA conjugates, biotin may 
be attached by using a biotinylated primer in an amplification after sampling. ' 
Alternatively, if the tag-cDNA conjugates are inserts of cloning vectors, biotin may be 
attached after excising the tag-cDNA conjugates by digestion with an appropriate 
1 5 restriction enzyme followed by isolation and filling in a protruding strand distal to the 
tags with a DNA polymerase in the presence of biotinylated uridine triphosphate; 

After a tag-cDNA conjugate is captured, it may be released from the biotin 
moiety in a number of ways, such as by a chemical linkage that is cleaved by : 
-reduction, e.g. Herman et al, Anal. Biochem., 156: 48-55 (1986), or that is cleaved 
20 photochemically, e.g. Olejnik et al, Nucleic Acids 1 Research, 24: 361-366 (1996), or 
that is cleaved enzymatically by introducing a restriction site in the PCR primer. The 
latter embodiment can be exemplified by considering the library of tag-polynucleotide 
conjugates described above: 

25 5 1 -RCGACCA[C f W, W, W] gGG [T] 19- cDNA -NNNR 

GGT[G,W,W,W] 9 CC[A] 19 - rDNA -NNNYCTAG-5 ' 

The following adapters may be ligated to the ends of these fragments to permit 
amplification by PCR: 



30 
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5 ' - xxxxxxxxxxxxxxxxxxxx 

XXXXXXXXXXXXXXXXXXXXYGAT 
Right Adapter 



GATCZZACTAGTZZZZZZZZZZZZ-B' 
40 ZZTGATCAZZZZZZZZZZZZ 
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Left Adapter 

" -•■ ■ ■ •■■ - •-. r,. . . . • ,. . ..,.■<■■ ■ _ 

ZZT«TCAZzizzZZ22ZZZ-5" , i-bidtin * ! " 

Left Primer "' ■ 

where "ACTAGT" is a Spe I recognition site (which leaves' a staggered cleavage 
ready for single base Sequencing), and trie X's and Z's are nucleotides selected so that 
the annealing arid dissociation temperatures of me respective primers are ' 
approximately the same. After ligation of the adapters and amplification by PCR 
using the' biotinylated primer, the tags of the conjugates' are rendered single stranded 
by the exonuclease activity of T4 DNA polymerase arid conjugates are combined with 
a sample of microparticles, e.g. a repertoire equivalent, with tag complements 
attached.* After annealing under stringent conditions (to minimize mis-attachment'of 
tags), the conjugates are preferably ligated to tneir tag complements and the loaded 
microparticles are separated fromthe unloaded microparticles by capture with 
avidinated magnetic beads, or like capture technique. 

Returning to the example, this process results in the accumulation of about 
10,500 (=16,700 x .63) loaded microparticles with different tags, which may be 
released from the magnetic beads by cleavage with Spe I. By repeating this process 
40-50 times with new samples of microparticles and tag-cDNA conjugates, 4-5 x 10 5 
cDNAs can be accumulated by pooling the released microparticles. The pooled 
microparticles may "then be simultaneously sequenced by a single-base sequencing 
technique. 

Determining how many times to repeat the sampling and panning steps-or 
more generally, determining how many cDNAs to analyze, depends on one's 
objective. If the objective is to monitor the changes in abundance of relatively 
common sequences, e.g. making up 5% or more of a population, then relatively small 
samples, i.e. a small fraction of the total population size, may allow statistically 
significant estimates of relative abundances. On the other hand, if one seeks to 
monitor the abundances of rare sequences, e.g. making up 0.1 % or less of a 
population, then large samples are required. Generally, there is a direct relationship 
between sample size and the reliability of the estimates of relative abundances based 
on the sample. There is extensive guidance in the literature on determining 
appropriate sample sizes for making reliable statistical estimates, e.g. Koller et al. 
Nucleic Acids Research, 23:185-191 (1994); Good, Biometrika, 40: 16-264 (1953); 
Bunge et al, J. Am. Stat. Assoc., 88: 364-373 (1993); and the like. Preferably, for 
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monitoring changes in gene expression based on the analysis of a.series of cDNA 
libraries containing 10 5 - to 10 8 independent clones of 3.0-3.5 x 10 4 different 
sequences, a sample of, at least 10 4 sequences are accumulated for anajysis of each 
library.; More preferably^a sample of at least JQ 5 ,sequences are, accumulated for the 
5 analysis of each library; and most .preferably, a sample of at least 5 x 10 5 sequences 
are accumulated for the analysis of each library. ; Alternatively, the number of 
sequences sampled is preferably sufficient to estimate, the relative abundance of a 
sequence present at a frequency within thexange of 0. 1 % to 5% with a 95% 
confidence limit no larger than 0-1% of the population size. , n 

Single Base DNA Sequencing , 
The present invention can be employed with conventional methods of DNA 
sequencing, e.g. as disclosed by Hultman et al, Nucleic Acids Research, 17: 4937- , 
4946 (1 989). However, for parallel, or simultaneous, sequencing of multiple 

1 5 polynucleotides, a DNA sequencing methodology is preferred that requires neither 
electrophoretic separation of closely sized DNA fragments nor analysis of cleaved 
nucleotides by a separate analytical procedure, as in peptide sequencing. Preferably, 
the methodology permits the stepwise identification of nucleotides, usually one at a 
- time, in a sequence through successive cycles of treatment and detection. Such 

20 methodologies are referred to herein as "single base" sequencing methods.. Single 
base approaches are disclosed in'the following : refer;ences: Cheeseman, U.S. patent 
5,302,509; Tsien et al, International application WO 91/06678; Rosenthal et al, 
International application WO 93/21340; Canard et al, Gene, 148: 1-6 (1994); and 
Metzker et al, Nucleic Acids Research, 22: 4259-4267 ( 1 994). 

25 A "single base" method of DNA sequencing which is suitable for use with the 

present invention and which requires no electrophoretic separation of DNA fragments 
is described in International application PCT/US95/03678. Briefly, the method 
comprises the following steps: (a) ligating a probe to an end of the polynucleotide 
having a protruding strand to form a ligated complex, the probe having a 

30 complementary protruding strand to that of the polynucleotide and the probe having a 
nuclease recognition site; (b) removing unligated probe from the ligated complex; (c) 
identifying one or more nucleotides in the protruding strand of the polynucleotide by 
the identity of the ligated probe; (d) cleaving the ligated complex with a nuclease; and 
(e) repeating steps (a) through (d) until the nucleotide sequence of the polynucleotide. 

35 or a portion thereof, is determined. 

A single signal generating moiety, such as a single fluorescent dye, may be 
employed when sequencing several different target polynucleotides attached to 
different spatially addressable solid phase supports, such as fixed microparticles. in a 
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parallel sequencing operation. This may be accomplished by providing four sets of 
probes that are applied sequentially to. the. plurality of target polynucleotides on the 
different microparticles. An.exemplary set of such probes are shown below: 





Set I 




Set 2 




i • • ? 


Set 3 


. . • < 

■ i . , 


Set 4 




ANNNN . , 
N. , 


. . NN 

. . NNTT . . 


. T* 


dANNNN.. 
d N . 


,.NN ; 
. .NNTT. 


. .T 


dANNNN 
N 


... NN ( , 
. . .NNTT. 


. .T 


dANNNN. 
N. 


. .NN 
. . NNTT . 


. .T 


dCNNNN . 
N. 


■'•NN, 
. . NNTT . 


. .T 


CNNNN ... 
N . . 


.NN 

.NNTT. . 


.T* 


, d CNNNN 

N, 


. . . NN 
. . . NNTT . 


. .T 

■ 


TtdCNNNN 
,N. 


r..NN ; , 
. .NNTT. 


is: 
• 


dGNNNN . 
N. 


. .NN 
: . NNTT . 


. . T 


dGNNNN . 

' N 


... NN 

. Vnntt: 


. lr 


GNNNR. . 
N. . 


.-. NN ; * 
. NNTT . . 


.T* 


dGNNNN - ; 

N. , 


. ;NN 
. . NNTT . 


t 

. ,T 


dTNNNN . 


. . NN „ 




. dTNNNN. 


. .NN 




dTNNNN . 
N. 






TNNNN. 

N.. : . 


NN 

NNTT . . 




N . 


. .NNTT.' 


; .:t 


. ;n. 


. . NNTT . 


. .T 


. .NN ' 
. . NNTT . 


. . T 


.T* 



where each of the listed probes represents a mixture of 4 3 =64 oligonucleotides such 
that the identity of the 3' terminal nucleotide of the top strand is fixed and the other 
positions in the protruding strand are filled by every 3-mer permutation of nucleotides, 
, or complexity reducing analogs. The listed probes are also shown with a single 
stranded poly-T tail with a signal generating moiety attached to the terminal thymidine, 
shown as "T*".. The "d" on the unlabeled probes designates a ligation-blocking moiety 
or absense of 3'-hydroxyl, which prevents unlabeled probes from being ligated. 
Preferably, such 3'-terminal nucleotides are dideoxynucleotides. In this embodiment, 
the probes of set lare first applied to the plurality of target polynucleotides and treated 
with a ligase so that target polynucleotides having a thymidine complementary to the 3' 
terminal adenosine of the labeled probes are ligated. The unlabeled probes are 
simultaneously applied to minimize inappropriate ligations. The locations of the target 
polynucleotides that form ligated complexes with probes terminating in "A" are 
identified by the signal generated by the label carried on the probe. After washing and 
cleavage, the probes of set 2 are applied. In this case, target polynucleotides forming 
ligated complexes with probes terminating in "C" are identified by location. Similarly, 
the probes of sets 3 and 4 are applied and locations of positive signals identified. This 
process of sequentially applying the four sets of probes continues until the desired 
number of nucleotides are identified on the target polynucleotides. Clearly, one of 
ordinary skill could construct similar sets of probes that could have many variations, 
such as having protruding strands of different lengths, different moieties to block 
ligation of unlabeled probes, different means for labeling probes, and the like. 
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Apparatus for Sequencing Populations of Polynucleotides 
An objective of.the invention is to sort identical molecules, particularly 
polynucleotides, onto the surfaces of microparticles by the specific hybridization of 
tags and their complements. Once such sorting has taken place, the presence of the 
molecules or operations performed on them can be detected in a number of ways 
depending on the nature of the tagged molecule, whether microparticles are detected 
separately or in "batches," whether repeated measurements are desired, and the like. 
Typically, the sorted molecules are exposed to ligands for binding, e.g. in drug 
development, or are subjected chemical or enzymatic processes, e.g. in polynucleotide 
sequencing. In both of these uses it is often desirable to simultaneously observe 
signals corresponding to such events or processes on large numbers of microparticles. 
Microparticles carrying sorted molecules (referred to herein as "loaded" 
microparticles) lend themselves to such large scale parallel operations, e.g. as 
demonstrated by Lam et al (cited above). 

Preferably, whenever light-generating signals, e.g. chemiluminescent, 
fluorescent, or the like, are employed to detect events or processes, loaded 
microparticles are spread on a planar substrate, e.g. a glass slide, for examination with 
a scanning system, such as described in International patent applications 
- PCT/US91/09217, PCT/NL90/00081, and PCT/US95/01886. The scanning system 
should be able to reproducibly scan the substrate and to define the positions of each 
microparticle in a predetermined region by way of a coordinate system. In 
polynucleotide sequencing applications, it is important that the positional 
identification of microparticles be repeatable in successive scan steps. 

Such scanning systems may be constructed from commercially available 
components, e.g. x-y translation table controlled by a digital computer used with a 
detection system comprising one or more photomultiplier tubes, or alternatively, a 
CCD array, and appropriate optics, e.g. for exciting, collecting, and sorting 
fluorescent signals. In some embodiments a confocal optical system may be 
desirable. An exemplary scanning system suitable for use in four-color sequencing is 
illustrated diagrammatically in Figure 5. Substrate 300, e.g. a microscope slide with 
fixed microparticles, is placed on x-y translation table 302, which is connected to and 
controlled by an appropriately programmed digital computer 304 which may be any of 
a variety of commercially available personal computers, e.g. 486-based machines or 
PowerPC model 7100 or 8100 available form Apple Computer (Cupertino, CA). 
Computer software for table translation and data collection functions can be provided 
by commercially available laboratory software, such as Lab Windows, available from 
National Instruments. 
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Substrate 300 and table 302 are operationally associated with, microscope 306 
having one or more objective lenses 308 which are capable of collecting and 
delivering light to microparticles fixed to substrate 300. Excitation beam.310 from 
light source 312, which is preferably a laser, is'directe'd to bearn splitter 3 14, e.g. a 
5 dichroic mirror, which re-directs the beam through microscope 306 and objective lens 
308 which, in turn, focuses the beam onto substrate 300. Lens 308 collects 
fluorescence 316 emitted from the microparticles and directs it through beam splitter 
3 14 to signal distribution optics 318 which, in turn, directs fluorescence to one or 
more suitable opto-electronic devices for converting some fluorescence characteristic. 

1 0 e.g. intensity, lifetime, or the like, to an electrical signal. Signal distribution optics 
3 1 8 may comprise a variety of components standard in the art, such as bandpass 
filters, fiber optics, rotating mirrors, fixed position mirrors and lenses, diffraction 
gratings, and the like. As illustrated in Figure 2, signal distribution optics 318 directs 
fluorescence 316 to four separate photomultiplier tubes, 330, 332, 334, and 336, ♦ 

1 5 whose output is then directed tp pre-amps and photon counters 350, ,3 52, 354, and 
3 56. The output of the photon counters is collected by computer 304, where it can be 
stored, analyzed, and viewed on video 360. Alternatively, signal distribution optics 
318 could be a diffraction grating which directs fluorescent signal 3 1 8 onto a CCD 
- array. 

20 The stability and reproducibility of the positional localization in scanning will 

determine, to a large extent, the resolution for separating closely spaced 
microparticles. Preferably, the scanning systems should be capable of resolving 
closely spaced microparticles, e.g. separated by a particle diameter or less. Thus, for 
most applications, e.g. using CPG microparticles, the scanning system should at least 

25 have the capability of resolving objects on the order of 10-100 urn. Even higher 
resolution may be desirable in some embodiments, but with increase resolution, the 
time required to fully scan a substrate will increase; thus, in some embodiments a 
compromise may have to be made between speed and resolution. Increases in 
scanning time can be achieved by a system which only scans positions where 

30 microparticles are known to be located, e.g from an initial full scan. Preferably, 

microparticle size and scanning system resolution are selected to permit resolution of 
fluorescently labeled microparticles randomly disposed on a plane at a density 
between about ten thousand to one hundred thousand microparticles per cm 2 . 

In sequencing applications, loaded microparticles can be fixed to the surface 

35 of a substrate in variety of ways. The fixation should be strong enough to allow the 
microparticles to undergo successive cycles of reagent exposure and washing without 
significant loss. When the substrate is glass, its surface may be derivatized with an 
alkylamino linker using commercially available reagents, e.g. Pierce Chemical, which 
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in turn may be cross-linked to avidin, again using conventional chemistries, to form 
an avidinated surface. 'Biotin moieties can be introduced to the loaded microparticles 
in a number of ways. For example, a fraction, e.g; 10-15 percerii, of the cloning 
vectors used to attach tags to polynucleotides are engineered to contain a unique 
5 restriction site (providing sticky ends oh digestion) immediately adjacent to the 
polynucleotide insert at an end of thie poiynucleotide opposite of the tag. The site is 
excised With thie polynucleotide and tag for loading onto microparticles. ' After 
loading, about 10-15 percent of the loaded polynucleotides will possess the" unique 
restriction site distal from the micrbparticle surface. After digestion with the " 
i 0 associated restriction endonuclease, an appropriate double stranded adaptor 

containing a oiotin moiety is ligated to the sticky end. The resulting microparticles 
are then spread on the avidinated glass surface where they become fixed via the 
biotin-avidin linkages/ " ! '• ■ >■■'■■■■ >i---' •>■ 

Alternatively arid preferably when sequencing by ligation is employed^ in the 
1 5 initial ligation step a mixture of probes is applied to the loaded microparticle: a 

fraction of the probes contain a type lis restriction recognition site, as required by the 
sequencing method, and a fraction of the probes have rio such recognition site, but 
instead contain a biotin moiety at its non-Iigating end. Preferably, the mixture 
-comprises about 10-15 percent of the biotinylated probe. 

In still another alternative, when DNA-lbaded microparticles 1 are applied to a 
glass substrate, the DNA may nonspecifically adsorb to the glass surface upon several 
hours, e.g. 24 hours, incubation to create a bond sufficiently strong to permit repeated 
exposures to reagents and washes without significant loss of microparticles. 
Preferably, such a glass substrate is a flow cell, which may comprise a channel etched 
in a glass slide. Preferably, such a channel is closed so that fluids may be pumped 
through it and has a depth sufficiently close to the diameter of the microparticles so 
that a monolayer of microparticles is trapped within a defined observation region. 

Identification of Novel Polynucleotides 
in cDNA Libraries 

Novel polynucleotides in a cDNA library can be identified by constructing a 
library of cDNA molecules attached to microparticles, as described above. A large 
fraction of the library, or even the entire library, can then be partially sequenced in 
parallel. After isolation of mRNA, and perhaps normalization of the population as 
taught by Soares et al, Proc. Natl. Acad. Sci., 91 : 9228-9232 (1994), or like 
references, the following primer may by hybridized to the polyA tails for first strand 
synthesis with a reverse transcriptase using conventional protocols (SEQ ID NO: 1 ): 
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5' -mRNA- [A] n -3' 

< ■■ lK -i'ITJ 19- [primer site] -GG [W, W, W, C] 9ACCAGCTGATC-5' 1 * ; 

where [W,W,W,C]q represents a tag as described above, ; "AGCAGCTGAT C" is an 
5 optional sequence forming a restriction site in double stranded form, and "primer site" 
is a sequence cornmon to all members of the library that is later used as a primer 
binding site foramplifying polynucleotides of sinterest by PCR. 

After reverse transcription and second strand synthesis by conventional 
techniques; the double stranded fragments are inserted into a clonihg vector aS' 1 ' " 
10 described above and amplified. The amplified library is then sampled and the sample 
amplified: The ^cloning vectors from the amplified sample are-isolated, and the tagged 
cDNA fragments excised and purified. After rendering /the tag single- stranded with a 
polymerase as described above, the fragments are methylated and sorted onto 1 
microparticles in accordance with the invention! freferably; as described above, -the 
1 5 cloning vector is constructed so that the. tagged cDN As can be excised with an 
endonuclease, such as Fok I, that will allow immediate sequencing by the preferred 
single base method after sorting and ligation to microparticles. ' 1 *> 

Stepwise sequencing is then carried out simultaneously on the whole library, 
or one or more large fractions of the library^ in accordance with the invention until a 

20 'sufficient number of nucleotides are identified on each' cDN A for unique 

representation in the genome of the organism from which the library is derived. For 
example, if the library is derived from mammalian mRNA then a randomly selected 
sequence 14-15 nucleotides long-is expected to have unique representation among the 
2-3 thousand megabases of the typical mammalian genome. Of course identification 

25 of far fewer nucleotides would be sufficient for unique representation in a library - 
derived from bacteria, or other lower organisms. Preferably, at least 20-30 
nucleotides are identified to ensure unique representation and to permit construction 
of a suitable primer as described below. The tabulated sequences may then be 
compared to known sequences to identify unique cDNAs. 

30 Unique cDNAs are then isolated by conventional techniques, e.g. constructing 

a probe from the PCR amplicon produced with primers directed to the prime site and 
the portion of the cDNA whose sequence was determined. The probe may then be 
used to identify the cDNA in a library using a conventional screening protocol. 

The above method for identifying new cDNAs may also be used to fingerprint 

3 5 mRNA populations, either in isolated measurements or in the context of a 
dynamically changing population. Partial sequence information is obtained 
simultaneously from a large sample, e.g. ten to a hundred thousand, or more, of 
cDNAs attached to separate microparticles as described in the above method. 
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Example! 

An exemplary tag T imtiy is constructed as fbllows'tb form the chemically 
5 synthesized 9-word tags of nucleotides A, GV and T defined by the formula: 

3'-TGGC-[ 4 (A,G,T) 9 ]-CCCCp 

where M [ 4 (A,G;T)9]" indicates a tag mixture where each tag consists of nine 4-mer 
10 words of A, G, and T; and M p" indicate a 5' phosphate. This mixture is ligated to the 
following right and left primer binding regipns (SEQ ID NO: 4 and SEQ ID NO 5): 



15 



5'- AGTGGCTGGGCATCGGACCG 5 1 - GGGGCCCAGTCAGCGTCGAT 

TCACCGACCCGTAGCCp ; - ... GGGTCAGTCpCAGCTA 



LEFT RIGHT 



The right and left primer binding regions are ligated to the above tag mixture, after 
which the single stranded portion of the ligated structure is filled with DNA 
20 polymerase then mixed with the right and left primers indicated below and amplified 
to give a tag library (SEQ ID NO: 6). 



30 



Left Primer 

AGTGGCTGGGCATCGGACCG 



AGTGGCTGGGCATCGGACCG - [ 4 (A, G, T) 9] -GGGGCCCAGTCAGCGTCGAT 
TCACCGACCCGTAGCCTGGC- [ 4 (A, G, T> 9] -CCCCGGGTCAGTCGCAGCTA 



CCCCGGGTCAGTCGCAGCTA- 5 1 
Right Primer 



35 The underlined portion of the left primer binding region indicates a Rsr II recognition 
site. The left-most underlined region of the right primer binding region indicates 
recognition sites for Bsp 1201, Apa I, and Eco O 1091, and a cleavage site for Hga I. 
The right-most underlined region of the right primer binding region indicates the 
recognition site for Hga I. Optionally, the right or left primers may be synthesized 

40 with a biotin attached (using conventional reagents, e.g. available from Clontech 
Laboratories, Palo Alto, CA) to facilitate purification after amplification and/or 
cleavage. 
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primer binding site Ppu MI site 

.1 I 

(plasmid) -5 ' ■ ^ AAAAGGAGG AG G C C T T G AT AG AG AGG AC C T - 

. r ... -TTTTCCTCCTeCGGAACTATCTCTCCT.GGA- J 



primer binding site 
I 

10 C -GTTTAAAC-GGATCC-TCTTCCTCTTCCTCTTCC- 3 ' - (plasmid) 

-CAAATTTG-CCTAGG-AGAAGGAGAAGGAGAAGG- 
1 t 

Bam HI site 

Pmel site 

15 

The plasmid is cleaved with Ppu MI and Pme I (to give a Rsr Il-compatible end and a 
flush end so that the insert is oriented) and then methylated with DAM methylase. 
The tag-containing construct is cleaved with Rsr II and then ligated to the open 
plasmid, after which the conjugate is cleaved with Mbo I and Bam HI to permit 
20 ligation and closing of the plasmid. The plasmid is then amplified and isolated and 
- used in accordance with the invention. 

Example 3 

Changes in Gene Expression Profiles in Liver Tissue of Rats 

25 Exposed to Various Xenobiotic Agents 

In this experiment, to test the capability of the method of the invention to 
detect genes induced as a result of exposure to xenobiotic compounds, the gene 
expression profile of rat liver tissue is examined following administration of several 
compounds known to induce the expression of cytochrome P-450 isoenzymes. The 

30 results obtained from the method of the invention are compared to results obtained 
from reverse transcriptase PCR measurements and immunochemical measurements of 
the cytochrome P-450 isoenzymes. Protocols and materials for the latter assays are 
described in Morris et al, Biochemical Pharmacology, 52: 781-792 (1996). 

Male Sprague-Dawley rats between the ages of 6 and 8 weeks and weighing 

35 200-300 g are used, and food and water are available to the animals ad lib. Test 
compounds are phenobarbital (PB), metyrapone (MET), dexamethasone (DEX), 
clofibrate (CLO), corn oil (CO), and P-naphthoflavone (BNF), and are available from 
Sigma Chemical Co. (St. Louis, MO). Antibodies against specific P-450 enzymes are 
available from the following sources: rabbit anti-rat CYP3 Al from Human Biologies, 

40 Inc. (Phoenix, AZ); goat anti-rat CYP4A1 from Daiichi Pure Chemicals Co. (Tokyo, 
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Japan); monoclonal mouse anti-rat CYP1A1, monoclonal mouse anti-rat CYP2C1 1, 
goat anti-rat CYP2E1, and monoclonal mouse,anti;*at CYP2B1 from Oxford 
Biochemical Itesear^ ; Jnc.(^o4^. Second Wi wtibodies,(g?at anti-rabbit IgG. 
rabbit anti-goat IgG and goat ami-mouse IgG) are available from Jackson 

5 ImmunoResearch^boratories (West G.rove, PA). . .. 

Animals are administered either PB (1 00 mg/kg), BNF (1 00 mg/kg), MET 
(100 mg/kg), DEX (100 mg/kg), or CLO (250 mg/kg) for 4 consecutive days via 
intraperitoneal injection following a dosing regimen similar to that described by 
Wang et al, Arch. Biochem. Biophys. 290: 355-361 (1991). Animals treated with 

10 H 2 0 and CO are used as controls. Two hours following4he last injection (day 4), 
animals are killed, and the livers are removed. Livers are immediately frozen and 
stored at -70°C. 

Total RNA is prepared from frozen liver tissue using a modification of the 
method described by Xie et al, Biotechniques, 1 1 : 326-327 (1991 ). Approximately 
5 1 00-200 mg of liver tissue is homogenized in the RNA extraction buffer described by 
Xie et al to isolate total RNA. The resulting RNA is reconstituted in 
diethylpyrocarbonate-treated water, quantified spectrophotometrically at 260 nm, and 
adjusted to a concentration of 100 ug/ml. Total RNA is stored in 
- diethylpyrocarbonate-treated water for up to 1 year at -70°C without any apparent 
degradation. RT-PCR and sequencing are performed on samples from these 
preparations. 

For sequencing, samples of RNA corresponding to about 0.5 ng of poly(A) + 
RNA are used to construct libraries of tag-cDNA conjugates following the protocol 
described in the section entitled "Attaching Tags to Polynucleotides for Sorting onto 
Solid Phase Supports," with the following exception: the tag repertoire is constructed 
from six 4-nucleotide words from Table II. Thus, the complexity of the repertoire is 

or about 2.6 x 105. For each tag-cDNA conjugate library constructed, ten samples 
of about ten thousand clones are taken for amplification and sorting. Each of the 
amplified samples is separately applied to a fixed monolayer of about 10* 10 urn 
diameter GMA beads containing tag complements. That is, the "sample" of tag 
complements in the GMA bead population on each monolayer is about four fold the 
total size of the repertoire, thus ensuring there is a high probability that each of the 
sampled tag-cDNA conjugates will find its tag complement on the monolayer. After 
the oligonucleotide tags of the amplified samples are rendered single stranded as 
described above, the tag-cDNA conjugates of the samples are separately applied to the 
monolayers under conditions that permit specific hybridization only between 
oligonucleotide tags and tag complements forming perfectly matched duplexes. 
Concentrations of the amplified samples and hybridization times are selected to 
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permit the loading of about 5 x 10 4 to 2 x 10 5 tag-cDNA conjugates on each bead 
where perfect matches occur. After ligation, 9-12 nucleotide portions of the attached 
cDNAs are determined in parallel by the single base sequencing technique described 
by Brenner in International patent application PCT/US95/03678. Frequency 
5 distributions for the gene expression profiles are assembled from the sequence 
information obtained from each of the ten samples. 

RT-PCRs of selected mRNAs corresponding to cytochrome P-450 genes and 
the constitutively expressed cyclophilin gene are carried out as described in Morris et 
al (cited above). Briefly, a 20 reaction mixture is prepared containing lx reverse 
1 0 transcriptase buffer (Gibco BRL); 1 0 riM dithiothreitol, 0.5 nM dNTPs, 2.5 oligo 
d(T)[5 primer, 40 units RNasin (Promega, Madison, WI), 200 units RNase H-reverse 
transcriptase (Gibco BRL), and 400 ng of total RNA (in diethylpyrocarbonate-treated 
water). The reaction is incubated for 1 hour at 37°C followed by inactivation of the 
enzyme at 95°C for 5 min. The resulting cDNA is stored at -20°C until used. For 
1 5 PCR amplification of cDNA, a 1 0 ^iL reaction mixture is prepared containing 1 Ox 
polymerase reaction buffer, 2 mM MgCl 2 , 1 unit Taq DNA polymerase (Perkin- 
Elmer, Norwalk, CT), 20 ng cDNA, and 200 nM concentration of the 5' and 3' 
specific PCR primers of the sequences described in Morris et al (cited above). PCRs 
-are carried out in a Perkin-Elmer 9600 thermal cycler for 23 cycles using melting, 
20 annealing, and extension conditions of 94°C for 30 sec, 56°C for 1 min., and 72°C 
for 1 min., respectively. Amplified cDNA products are separated by PAGE using 5% 
native gels. Bands are detected by staining with ethidium bromide. 

Western blots of the liver proteins are carried out using standard protocols 
after separation by SDS-PAGE. Briefly, proteins are separated on 10% SDS-PAGE 
25 gels under reducing conditions and immunoblotted for detection of P-450 isoenzymes 
using a modification of the methods described in Hams et al, Proc. Natl. Acad. Sci., 
88: 1407-1410 (1991). Protein are loaded at 50 ^ig/lane and resolved under constant 
current (250 V) for approximately 4 hours at 2°C. Proteins are transferred to 
nitrocellulose membranes (Bio-Rad, Hercules, CA) in 15 mM Tris buffer containing 
30 120 mM glycine and 20% (v/v) methanol. The nitrocellulose membranes are blocked 
with 2.5% BSA and immunoblotted for P-450 isoenzymes using primary monoclonal 
and polyclonal antibodies and secondary alkaline phosphatase conjugated anti-IgG. 
Immunoblots are developed with the Bio-Rad alkaline phosphatase substrate kit. 
The three types of measurements of P-450 isoenzyme induction showed 
35 substantial agreement. 



-43- 



JVQ 97/13877 



PCI7US96/16342 



APPENDIX la 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program minxh 
c 



c 
c 



c 
c 



c 
c 

c 
c 
c 



c 
c 



integer*2 subl ( 6) , mset 1 ( 1000, 6) , mset2 ( 1000, 6 > 
dimension nbase < 6) 



write ( * f * ) 1 ENTER SUBUNIT LENGTH • 
read(*, 100)nsub 
100 format (il) 

open ( 1 , file- 1 sub4 . dat 1 , f orm= • formatted ' , status= ' new • ) 



nset=0 

do 7000 ml=l,3 
do 7000 m2=l, 3 
do 7000 m3=l, 3 
do 7000 m4=l, 3 
subl (l)=ml 
subl (2)=m2 
subl { 3) =m3 
subl (4)=m4 



ndiff=3 



Generate set of subunits differing from 
c subl by at least ndiff nucleotides, 

c Save in msetl . 



jj-1 

do 900 j=l,nsub 
900 msetld, j)=subl(j) 



do 1000 kl=l,3 
do 1000 k2=l,3 
do 1000 k3=l,3 
do 1000 k4=l,3 



nbase (l)=kl 
nbase(2)=k2 
nbase (3)=k3 
nbase (4 ) =k4 
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. 1200 
c 
c 

c 
c 

c 
c 
c 
c 

1100 

c 
c 

1000 

c 

c 

1325 



c 
c 
c 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

c 
c 

1700 
c 



A T ; V i .... 

do 1200 j = l,nsub-' ■ - * 
I r . \ it :(»sub 1A j ). ;.eg.. 1, t ,..and . . nba se ( j ). f . ne . 1 .or. 
1 subl(j).eq.2 .and., nbase {"j") . he/2 .or. 

3 ' siib^'i y) :e f q'i^ ; 0smd.- '"nbaseSj ) .ne . 3) then 

' 1 * v.;,-*n«n+l; yi !t . i- a v . , , 

endif 
continue 



if {n.ge.,ndif f ) then 



If number of mismatches 
is greater than or equal 
to ndiff then record 
subunit in matrix mset 



do 1100 i=l / nsub 

msetl ( jj,i)=nbase(i) 
endif .. 



continue 



do 1325 j2=l,nsub 
mset2 (1, j2) =msetl (1, j2) 
mset 2 (2, j2) -msetl [2, j2) 



Compare subunit 2 from 
msetl with each successive 
subunit in msetl, i.e. 3, 
4,5, ... etc. Save those 
with mismatches .ge. ndiff 
in matrix mset2 starting at 
position 2. 

Next transfer contents 
of mset2 into msetl and 
start 

comparisons again this time 
starting with subunit 3. 
Continue until all subunits 
undergo the comparisons. 



npass=0 



continue 

kk=npass+2 

npass=npass+l 
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c 



2 
2 

1600 

1625 

1500 

c 
c 
c 
c 
c 
c 
c 

2000 



c 
c 

7009 

7008 
7010 

120 
7000 

c 
c 

c 



do 1500 m=npass+2, j j 
n=0 

do 1600 j=l,nsub 

if (msetl (npass+1, j) .eq. 1 .and.msetl (m, j ) . ne . 1 . or . 
'msetl < npass+1, j) .eq. 2 . and.msetl (m, j) .ne.2.or. 
msetl (npass+1, j) . eq . 3 . and .mset 1 <m, j ) .ne.3) then 
;h=*n'+l ; 
endif ' 
continue 
if (n.ge.ndif f ) then 

- kk=kk+l. 

- - do 1625 i=l,nsub 

mset2 (kk,i)=msetl (m,i) 

endif'"' 

continue . , 

kk is the number of subunits 
stored in mset2 

Transfer contents of mset2 
into msetl for next pass. 



do 2000 k=l., kk . 

do 2000. m=l, nsub- 

msetl (k,m)=mset2 {k,m) 
if (Ick.lt. jj) then 
jj=kk 

goto 1700' ' 
endif 



nset=nset+l 
write (1, 7009) 

'format {/} 
do 7008 k=l,kk 

write (1, 7010) (msetl ( k, m) , m=l , nsub) 
format(4il) 
write(*,*) . 
write(*, 120) kk,nset 

format (lx, 'Subunits in set- • , i5, 2x, ' Set No=\i5) 
continue 
close (1 ) 



end 
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APPENDIX lb 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program tagN 

c ■ " . -:■ . 
c 

c . . .Program tagN generates minimally cross-hybridizing 

c* sets of subunits given i) N — subunit length, and ii) 

c an .initial, subunit sequence: tagN -assumes that only 

c - 3 of .the four natural nucleotides are used in the tags. 

c , - ' : - - 

c 

character*l subl (20) 

integer*2 mset (10000, 20) , nbase(20) 

c 
c 

-write (V*) 'ENTER SUBUNIT LENGTH* 

read (*, 100) nsub 
100 format (i2) 

c 
c 

write(*,.*) 'ENTER SUBUNIT SEQUENCE' 

read**, 110) (subl (k) , k=l,nsub) 
110 format (20al) 

c 
c 

- ndif f-10 



800 continue 

c 
c 



c 
c 



Let a=l c=2 g=3 & t=4 



do. 800 kk=l, nsub 

if (subl (kk) .eq. 'a* ) then 

mset (1, kk)=l 

endif 

if (subl (kk) .eq. 'c* ) then 
mset(l,kk)=2 
endif 

if (subl (kk) .eq. 'g' ) then 
mset(l,kk)=3 
endif 

if (subl (kk) .eq. ' t ' ) then 
mset ( 1, kk) =4 
endif 



c Generate set of subunits differing from 

c subl by at least ndiff nucleotides. 



do 1000 ki=l,3 
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do 1000 k2=l,3 
do 1000 k3=l,3 
- do, 1000 Jc4»l, 3 
do 1000 k5=l / 3 
do 1000 k6=l,3 
do 1000 k7=l,3 
" 1 do 1000 k8=l, 3 

do 1000 k9=l,3 
do 1000 klO-1,3 

do 1000 kll=l,3 
• do 1000 kl2=l, 3 

do 1000 kI3«l,3 
do 1000 k!4=l,3 
. do 1000 ki5=l, 3 
- ..do 1000 kl6=l, 3 
. / _ ■: do 1000 kl7 = l, 3 
■ : '/* ; , - do 1000 kl8=l,3 

do 1000 U9=l, 3 
do 1000 k20=l,3 

C . • - . , . , - 

c 

nbase { 1 ) =kl 
^ nbase(2)=k2 
nbase(3)=k3 
nbase ( 4 ) = k4 
nbase (5) =k5 
nbase ( 6) =k6 
nbase(7)=k7 
nbase(8)=k8 
nbase (9) =k9 
' .nbase (10}=kl0 
nbase (ll)-kll 
nbase ( 12 )=kl2 
nbase(13)=kl3 
nbase(14}=kl4 
nbase(15}=U5 
nbase(16)=kl6 * 
nbase(17)=kl7 
nbase(18)=kl8 
,nbase(19)=kl9 
nbase (20)=k20 



do 1250 nn^l, j j 



1200 
c 



n=0 

do 1200 j=l,nsub 

if (mset (nn, j ) .eq. 1 
mset (nn, j) .eq.2 
mset (nn, j ) .eq. 3 
mset (nn, j ) . eq . 4 
n=n+ 1 
endif 
continue 



. and. 
.and. 
. and. 
. and. 



nbase ( j ) 
nbase ( j ) 
nbase ( j ) 
nbase ( j ) 



ne 
ne 
ne 
ne 



, 1 
2 
3 
4) 



. or . 
. or . 
. or . 
then 



1250 
c 



if {n.lt.ndif f ) 
goto 1000 
endif 

continue 



then 



writer, 130) (nbase (i) ,i=l,nsub) , j j 
do 1100 i=l,nsub 
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1100 

c 

1000 

c 

c 

130 

120 

c 

c 

c 
c 
c 



inset ( j j, i)=nbase (i) 
continue 



continue 



write { * , * ) 

format ( lOx, 20 { lx, il ) , 5x, i5) 
write ( * , * ) 
write<*, 120) jj 

format (lx, 'Number of words= ! ,i5) 



end 
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APPENDIX Ic 

Exemplary com puter pro gr am for generatin g . 
minimally cross hybridizing sets 
, i . ; ^°H ble s^andejd. tag/single straxided tag complement) 



Program 3tagN 

c t - 

c 
c 
c 
c 

c .... 



c 



c 
c 



c 



'Program 3tagN -generates minimally cross-hybridizing ' 
j. sets of duplex subunits , given .i) N—subuni-t .length, 
and ii ) an' initial hombpurine sequence. 



character* 1 ,subl'{'20)- . T < ■■ .? • ..r 

integer*2 mset (10000, 20) , nbase.(20). 

c • , /• . i-.."iv. ^ . . . t ,-. 

write { * , * ) 'ENTER SUBUNIT LENGTH 1 ; 
> y .read.{*> 100)nsub. 
100 format (i2) 

c ' • . i/ t ; 

c 

write C*; *')■ 'ENTER 'SUBUNIT SEQUENCE a' & g only ' 
. . ... xead( V110) (subl < k).,.k=.l, nsub) 
110 format (20al) 



ndif f«10 



.Let a=l. and g=2 



• do 800 kk=l,.nsub ' 

if (subl (kk) .eq. 'a* ) then 
mset (l/kk)=l 
endif 

if (subl (kk) .eq. »g* ) then 
mset(l,kk)=2 
endif 

800 continue- 1 



do 1000 kl=l,3 
do 1000 k2=l,3 
do 1000 k3«l, 3 
do 1000 k4=l,3 
do 1000 k5-l, 3 
do 1000 k6*=l,3 
do 1000 k7«i,3 
do 1000 k8=l,3 
do 1000 k9=l,3 
do 1000 U0=l,3 

do 1000 kll=l,3 
do 1000 kl2=l, 3 
do 1000 kl3=l,3 
do 1000 kl4=l, 3 
do 1000 kl5=l,3 
do 1000 kl6«l, 3 
do 1000 kl7=l,3 
do 1000 kl8=l,3 
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c 
c 



do 1000 kl9=l,3 
do 1000 k20=l,3 



nbase (1) 
nbase (2) 
nbase (3) 
nbase (4 ) 
nbase (5) 
nbase (6). 
nbase (7) = 
nbase (8) 
nbase (9) = 
nbase (10) 
nbase ( 11 ) 
: hbase(12) 
nbase (13) 
nbase (14 ) 
nbase (15) 
nbase (16) 
*nbase{ 17) 
nbase ,( 18 ) 
nba s e U'9 j 
nbase.(20 } ) 



kl 
k2 
k3 
k4 
k5 
k6 
k7 
k8 
k9 

=kl0 
= kll 
=kl2 
=kl3 
«kl4 
«kl5 
=kl6 
«k!7; 
=kl8 
=kl9 
=k20 



L200 



: da 1250;.nn=l, j j 

do 1200 j=l / nsub 
if (mset ( nn, j ) . 
. - mset. {nn., j ) 
mset (nn, j ) . 
mset (nn, j ) . 
n=n+l 
endif 
.continue 



eq. 1 
eq.,2 
eq. 3 
eq. 4 



.and. nbase (j ). ne . 1 .or. 

.and. nbase ( j ),. ne. 2 .or., 

.and. nbase ( j ) . ne . 3 .or. 

.and. nbase ( j ) . ne. 4 ) then 



1250 
c 



1100 
c 

1000 
c 

130 



120 

c 

c 



if (n. It .ndif f ) then 
goto 1000 
endif 

cont inue - . /■ ., ( 

jj=jj*l v • 

write (*, 130) (nbase (i ) , i = l, nsub) , j j 
do 1100 i=l,nsub 

mset ( j j , i ) =nbase ( i ) 
continue 

continue 
write {*, *) 

format ( 10x,20 ( lx, il ) , 5x,i5) 
write (*.,*) 
write(*,120) jj 

format (Ix, 'Number of words=',i5) 



end 
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SEQUENCE "LISTING 1 "'" " " ""' " 

(li GENERAL, INFORMATION.- ? -V./.-. t *)■';.' 

(i) APPLICANT: David W. Martin, Jr. "... 

(ii) TITLE. OF INVENTION: Measurement of ' Gene ' Expression profiles in 
Toxicity Determination J ... , 

(iii) NUMBER OF SEQUENCES : ' ,! 7 ' * ' " " ' - ' ' ; ! * • '* J 

(iv) . CORRESPONDENCE . ADDRESS:: ..... 

(A) ADDRESSEE: Stephen C. Macevicz/ Lynx' Therapeutics, Inc 
. (B) STREET: 3832 . Bay. Center Place 

(C) CITY: Hayward ' ' 

(D) -STATE: California. \, ■.. = , . ^ 1 ■ v - 

(E) COUNTRY: USA 

. ! :<F) ZIP: 94545' - ■ - \ ~ , ' 

(v) COMPUTER READABLE FORM: . 

(A) - MEDIUM TYPE: 3.5 inch diskette '-■ " 

(B) COMPUTER: IBM compatible 

(C) OPERATING SYSTEM: Windows'" 3 . 1 s> ' " : ' ' ' ' " 

(D) SOFTWARE: Microsoft Word 5.1 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: -r . " ■ ' ' >, 

(B) FILING DATE: 

(C) CLASSIFICATION: .» • ■ 

(vii) PRIOR APPLICATION DATA: 

(A) . APPLICATION NUMBER: PCT/US96/095 13 

(B) : FILING DATE: 06-JUN-96 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCTVUS95/12 7 91 
. (B.).., FILING DATE: 12-OCT-95 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Stephen C. Macevicz 

(B) REGISTRATION NUMBER: 30,285 

(C) REFERENCE /DOCKET NUMBER: 813wo 



(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (510) 670-9365 
'(B) TELEFAX: (510) 670-9302 



(2) INFORMATION FOR SEQ ID NO: 1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



-52- 



WO 97/13877 



PCT/US96/1G342 



•'•(•xi ) : 'SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CTAGTCGACC.A-., v^,.-^^ '-.5-,-, A 1 .,-. - 



(2) INFORMATION FOR SEQ ID NO: 2: 

•-VrV" SEQUENCE. CHARACTERISTICS:-' 11 :A.r 
i (A) LENGTH: 11 nucleotides 

TYPE:' nucleic acid* ; ' : • • * - - : r ; 
. ,{C) STRANDEDNESS: single 

■'- v - • { D) TOPOLOGY : linear x 1 *" ' >0 - 5 ' : ' - V \ 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



.NRRGAT.CYNN N 



(2) INFORMATION FOR SEQ ID ; NO: 3 :' ' 



(ii SEQUENCE CHARACTERISTICS : . 

(A) LENGTH: 38 nucleotides 

(B) TYPE:- -nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



. (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GAGGATGCCT TTATGGATCC ACTCGAGATC CCAATCCA. 38 



(2) INFORMATION FOR SEQ ID NO: 4: 



( i } SEQUENCE ■ CHARACTERISTICS : 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic. acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AGTGGCTGGG CATCGGACCG 2C 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 
ID) TOPOLOGY; . linear 



{xi) 'SEQUENCE DESCRIPTION: ^EQ ID NO: 5:' 



GGGGCCCAGT CAGCGTCGAT 



(2) , INFORMATION- .FOR..SEQ- ID NO: -6: • *.>.■,■«. r ■ ..r 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: ' single'" :r ' " : ■ " : : r " '"■ * 
<D) TOPOLOGY.: ..linear. . it ... ....... r M 

(xi) SEQUENCE DESCRIPTION: SEQ ID Nb: 6: !?i ' v 



ATCGACGCTG 'ACTGGGCCCC 



(2) INFORMATION FOR SEQ. ID NO: 7: * K . 

ti) SEQUENCE CHARACTERISTICS: ' ^ '*'" : *' * '' ' 

(A) LENGTH: 62. nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double "* : "»" • 

(D) TOPOLOGY: linear 



{xi} SEQUENCE DESCRIPTION: SEQ ID f NO: 7 :' 



AAAAGGAGGA GGCCTTGATA 'gaGAGGACCT GTTTAAACGG ATCCTCTTCC 
TCTTCCTCTT CC ■ '* 
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I claim: 

1 . A method of determining the toxicity of a compound, the method comprising 
the steps of: " " ! "' *' - ? - - ^ *" 

5 administering the compound to a tesrorganism;^ . ( it 

extracting a population of mRNA molecules from each of one or more tissues 

of the test organism; - * ■ • * ( 

forming a separate population of cDN A molecules from each population of 
mRNA molecules from the one or more tissues such that each cDNA molecule of a 
1 0 separate population has an oligonucleotide tag attached, the oligonucleotide tags 
being selected from the same minimally cross-hybridizing set; ~ 

separately sampling each population of cDNA molecules such that 
substantially all different cDNA molecules within a separate population have different 
oligonucleotide tags attached; m ^ ii^ ; • , 

1 5 sorting the cDNA molecules of eabh separate population by specifically 

hybridizing the oligonucleotide tags with their respective complements, the respective 
complements being attached as uniform populations of substantially identical 
complements in spatially discrete regions on one or more solid phase supports; 

determining the. nucleotide sequence of a portion of each of the sorted cDNA 
20 molecules of each separate population to form a frequency distribution of expressed 
genes for each of the one or more tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

25 2. The method of claim 1 wherein said oligonucleotide tag and said complement 
of said oligonucleotide tag are single stranded. 

3. The method of claim 2 wherein said oligonucleotide tag consists of a plurality 
of subunits, each subunit consisting of an oligonucleotide of 3 to 9 nucleotides in 

30 length and each subunit being selected from the same minimally cross-hybridizing set. 

4. The method of claim 3 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

35 of unloaded microparticles. 

5. The method of claim 4 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 
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6. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 

5 

7. The method of claim 6 wherein said number of loaded microparticles is at 
least 100,000. 

8. The method of claim 7 wherein said number of loaded microparticles is at 
10 least 500,000. > / ; 

9. The method x>t claim 5 fuilther including a^step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is ; 
accumulated is sufficient to estimate the relative abundance of a cDNA molecuIe 

5 present In said population at a frequency within the range of from 0.1% to 5% with a 
95% confidence limit no larger than 0.1% of said population. . J 

1 0. The method of claim 4 wherein said test organism is a mammalian tissue 
* culture. 



!0 



0 



1 1 . The method of claim 1 0 wherein said mammalian tissue culture comprises 
hepatocytes. 

12. The method of claim 4 wherein said test organism is an animal selected from 
the group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 

1 3. The method of claim 12 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 



14. A method of identifying genes which are differentially expressed in a selected 
5 tissue of a test animal after treatment with a compound, the method comprising the 
steps of: 

administering the compound to a test animal; 
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extracting a population of mRNA molecules from the selected tissue of the 
test animal; " " 

; - forming a population of cDNA molecules from the population of mRNA 
' * molecuies such that^ach cDNX molecule has-an oligonucleotide tag attached,- the 
. ; 5 ~ ; yv pIigQiiuclebtide; tags being selected, from the, same minimally cross-hybridizing set; 
sampling the population of cDNA molecules such that substantially all 
- different cDNA molecules have different oligonucleotide tags attached; 

sorting the cDNA molecules by specifically hybridizing the oligonucleotide 
tags with their respective complements, the respective complements being attached as 
10 uniform populations of substantially idem • 
"regions on one. or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules tb form a frequency distribution of expressed genes; and 

\ identifying genes expressed in response to administering the compound by ; 
1 5 comparing the frequenting distribution of expressed genes of the selected tissue of the 
test animal with a frequency distribution of expressed genes of the selected tissue of a 
control animal. ,'■ ; 1 r , 

* 15. The method of claim 14 wherein said 'oligonucleotide tag and said 
20 complement of said oligonucleotide tag are single stranded. 

1 55. The method of claim 15 wherein said oligonucleotide tag consists of a 
plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length and each subunit being selected from the same minimally cross- 
25 hybridizing set. 

1 7. The method of claim 1 6 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

30 of unloaded microparticles. 

1 8. The method of claim 1 7 farther including a step of separating said loaded 
microparticles from said unloaded microparticles. 

35 19. The method of claim 1 8 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 
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20. . The method of claim 19, wherein said number of loaded microparticles is at 

least lodoob: / 7 ; " : : " l ' J r " i r ; ; : 

21. The method of claim 20 wherein said number of loaded mfcropailEicles is at 
least 500,000. • > > --^ ' *..' • ■ y\r>r [ . 

22. .The method of claim 18 fiirther including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 
present in said population at a frequency within the range of from 0.1% to 5% with a 
95% confidence limit no larger than 0. 1 % of said population. 

23. The method of claim 17 wherein said test animal is selected fromithe group 
consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and monkeys. 

24. The method of claim 23 wherein said selected tissue is selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 

- mesenteric lymph nodes. 

25. A use of the technique of massively parallel signature sequencing to determine 
the toxicity of a compound in a test organism, the use comprising the steps of: 

administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism and forming a population of cDNA molecules for each of the one 
or more tissues; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules of each separate population using massively parallel signature sequencing 
to form a frequency distribution of expressed genes for each of the one or more 
tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

26. The use of claim 25 wherein said test organism is a mammalian tissue culture. 

27. The use of claim 26 wherein said mammalian tissue culture comprises 
hepatocytes. 
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28. The use of claim 25 wherein said test organism is an animal selected from the 
group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 

5 29. The use of claim 28 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 

10 30. A use of the technique of massively parallel signature sequencing to identify 
genes which are differentially expressed in a test organism after treatment with a 
compound and which are correlated with toxicity of the compound, the use 
comprising the steps of: 

administering the compound to the test organism; 
1 5 extracting a population of mRNA molecules from a selected tissue of the test 

organism and forming a population of cDNA molecules; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules using massively parallel signature sequencing to form a frequency 
- distribution of expressed genes; 
20 identifying genes expressed in response to administering the compound by 

comparing the frequencing distribution of expressed genes of the selected tissue of the 
test organism with a frequency distribution of expressed genes of the selected tissue 
of a control organism; and 

determining whether the genes expressed in response to administering the 
25 compound are correlated with toxicity of the compound in the test organism. 
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110, 
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